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Abstract 

We consider synchronous distributed systems in which anonymous processors communicate 
by shared read-write variables. The goal is to have all the processors assign unique names to 
themselves. We consider the instances of this problem determined by whether the number n 
is known or not, and whether concurrently attempting to write distinct values into the same 
memory cell is allowed or not, and whether the number of shared variables is a constant inde¬ 
pendent of n or it is unbounded. For known n, we give Las Vegas algorithms that operate in 
the optimum expected time, as determined by the amount of available shared memory, and use 
the optimum O{nlogn) expected number of random bits. For unknown n, we give Monte Carlo 
algorithms that produce correct output upon termination with probabilities that are 1 — 
which is best possible when terminating almost surely and using C*(n log n) random bits. 
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1 Introduction 


We consider a distributed system in which some n processors communicate using read-write shared 
memory. It is assumed that operations performed on shared memory occur synchronously, in that 
executions of algorithms are structured as sequences of globally synchronized rounds. The model 
of synchronous systems with read-write registers is known as the Parallel Random Access Machine 
(PRAM). It is a generalization of the Random Access Machine model of sequential computation [60] 
to the realm of synchronous concurrent processing. 

We study the problem of assigning unique integer names from the interval [1, n] to the n proces¬ 
sors of a PRAM, when originally the processors do not have distinct identifiers. This task is called 
naming and is understood such that all the processors cooperate by executing a distributed algo¬ 
rithm to assign unique names to themselves. We assume that the original anonymous processors 
do not have any feature facilitating identification or distinguishing one from another. When pro¬ 
cessors of a distributed/parallel system are anonymous then the task of assigning unique identifiers 
to all processors is a key step in making the system fully operational, because names are needed 
for executing deterministic algorithms. 

The task to assign unique names to anonymous processes by themselves in distributed systems 
can be considered as a stage in either building such systems or making them fully operational. 
Correspondingly, this may be categorized as either an architectural challenge or an algorithmic 
one. For example, tightly synchronized message passing systems are typically considered under the 
assumption that processors are already equipped with unique identihers. This is because such sys¬ 
tems impose strong demands on the architecture and the task of assigning identifiers to processors 
is modest when compared to providing synchrony. Similarly, when synchronous parallel machines 
are designed, then processors may be identified by how they are attached to the underlying commu¬ 
nication network. In contrast to that, PRAM is a virtual model in which processors communicate 
via shared memory; see an exposition of PRAM as a programming environment given by Keller et 
al. [41] . This model does not assume any relation between the shared memory and the processors 
that would be conducive to identifying processors. 

Distributed systems with shared read-write registers are usually considered to be asynchronous. 
Synchrony in such environments can be added by simulation rather than by a supportive architec¬ 
ture or an underlying communication network. Processes do not need to be hardware nodes, instead, 
they can be virtual computing agents. When a synchronous PRAM is considered, as obtained by a 
simulation, then the underlying system architecture does not facilitate identifying processors, and 
so we do not necessarily expect that processors are equipped with distinct identifiers at the start 
of a simulation. 

We view PRAM as an abstract construct which provides a distributed environment to develop 
algorithms with multiple agents/processors working concurrently; see Vishkin [61] for a compre¬ 
hensive exposition of PRAM as a vehicle facilitating parallel programing and harnessing the power 
of multi-core computer architectures. Assigning names to processors by themselves in a distributed 
manner is a plausible stage in an algorithmic development of such environments, as it cannot be 
delegated to the stage of building hardware of a parallel machine. 

We consider two categories of naming problems depending on how much shared memory is 
available for a PRAM. In one case, the memory is bounded, in that just a constant number of 
memory cells is available. This means that the amount of memory is independent from the number 
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PRAM Model 

Memory 

Time 

Algorithm 

Arbitrary 

0(1) 

0(re) 

Arbitrary-Bounded-LV in Section 0] 

Arbitrary 

0{n/ log re) 

O (log re) 

Arbitrary-Unbounded-LV in Section [5] 

Common 

0(1) 

O(relogre) 

Common-Bounded-LV in Section [6] 

Common 

0{n) 

O (log re) 

Common-Unbounded-LV in Section [7] 


Table 1: Four naming problems for known n, as determined by the PRAM model and 
the available amount of shared memory, with the respective performance bounds. All 
algorithms are Las Vegas. 


of processors n but as large as needed in an algorithm’s design. In the other case, the number 
of shared memory cells is unbounded, in the sense that it unlimited in principle but how much is 
actually used by an algorithm depends on re. When it is assumed that an unbounded amount of 
memory cells is available, then the expected number of memory cells that are actually used may 
be considered as a performance metric. 

Independently of the amount of shared memory available, we consider two versions of the naming 
problem, determined by the semantics of concurrent writing in the underlying model of computation. 
This is represented by the corresponding PRAM variants, which are either the Arbitrary PRAM or 
the Common PRAM. The Arbitrary PRAM allows to attempt to write concurrently distinct values 
into a register, and an arbitrary one of them gets written. The Common PRAM variant allows only 
equal values to be concurrently written into a register. 

Randomized algorithms are typically categorized as either Las Vegas or Monte Carlo; this cate¬ 
gorization is understood as follows. A randomized algorithm is Las Vegas when it terminates almost 
surely and the algorithm returns a correct output upon termination. A randomized algorithm is 
Monte Carlo when it terminates almost surely and an incorrect output may be produced upon 
termination, but the probability of error converges to zero with the number of processors growing 
unbounded. 

We say that a parameter of an algorithmic problem is known when it can be used in a code 
of an algorithm. When the number of processors re is known, then we give Las Vegas algorithms 
for each of the four cases of naming determined by the kind of PRAM model and the amount of 
shared memory. When the number of processors is unknown, then we give Monte Carlo algorithms 
for each of the respective four cases of naming. 

The summary of the results. We consider randomized algorithms executed by anonymous 
processors that operate in a synchronous manner using read-write shared memory with the goal to 
assign unique names to the processors. The algorithms have to be randomized (no deterministic 
exist) and when the number of processors is unknown then they need to be Monte Carlo (no Las 
Vegas exist). 

We show that naming algorithms for re processors using C > 0 shared memory cells need to 
operate in Vl{n/C) expected time on an Arbitrary PRAM, and in n(relogre/C') expected time on a 
Common PRAM. We prove additionally that any naming algorithm needs to work in the expected 
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PRAM Model 

Memory 

Time 

Algorithm 

Arbitrary 

0(1) 

0(n) 

Arbitrary-Bounded-MC in Section [5] 

Arbitrary 

unbounded 

polylog 

Arbitrary-Unbounded-MC in Section [9] 

Common 

0(1) 

O(nlogn) 

Common-Bounded-MC in Section [TOl 

Common 

unbounded 

polylog 

Common-Unbounded-MC in Section fTTl 


Table 2: Four naming problems for unknown n, as determined by the PRAM model 
and the available amount of shared memory, with the respective performance bounds. 
All the algorithms are Monte Carlo. When time is marked as “polylog” then this 
means that the algorithm comes in two variants, such that in one the expected time is 
O(logn) and the amount of used shared memory is suboptimal and in the other 

the expected time is suboptimal O(log^n) but the amount of used shared memory 
misses optimality only by at most a logarithmic factor. 


time n(logn); this bound is relevant only when there is unbounded supply of shared memory. We 
show that, for unknown n, a Monte Carlo naming algorithm that uses O(relogn) random bits has 
to fail to assign unique names with probability that is . 

We consider eight specific naming problems for PRAMs. They are determined by the following 
independent specihcations: whether n is known or not, what is the amount of shared memory 
(constant versus unbounded), and by the PRAM variant (Arbitrary versus Common). 

For the case of known n, the naming algorithms we give are all Las Vegas. The naming problems’ 
specifications and the corresponding algorithms with their performance bounds are summarized in 
Table [TJ These algorithms operate in asymptotically optimal times, for a given amount of shared 
memory, and use the optimum expected number O(nlogn) of random bits. When the amount of 
memory is unbounded, they use only the amount of space that is provably necessary to attain their 
running-time performance. 

For the case of unknown n, the naming algorithms we give are all Monte Carlo. The list of the 
naming problems’ specifications and the corresponding algorithms with their performance bounds 
are summarized in Table[2j All Monte Carlo algorithms that we give have the polynomial probability 
of error, which is best possible when using the O(relogn) expected number of random bits. When 
the shared memory is bounded, then these algorithms operate in asymptotically optimal times, for 
bounded memory, and use the optimum expected number O(nlogn) of random bits. When there 
is unbounded supply of shared memory, then we give two variants of the algorithms for Arbitrary 
PRAM and two for Common PRAM, with the goal to optimize different performance metrics. The 
set of integers used as names always makes a contiguous segment starting from the smallest name 1, 
so that the only possible kind of error is in assigning duplicate names. 

Previous and related work. The naming problem for a synchronous PRAM has not been 
previously considered in the literature, to the best of the authors’ knowledge. There is a voluminous 
literature on various aspects of computing and communication in anonymous systems, the following 
review is necessarily selective. 
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We begin with the previous work on naming in shared-memory systems with read-write registers. 
A systematic exposition of shared-memory algorithm can be found in when approached from 
the distributed-computing perspective, and in [39], when approached from the parallel-computing 
one. 

Lipton and Park [JH] considered naming in asynchronous distributed systems with read-write 
shared memory controlled by adaptive schedulers; they proposed a solution that terminates with 
positive probability, and which can be made arbitrarily close to 1 assuming that n is known. 
Egecioglu and Singh [28| proposed a polynomial-time Las Vegas naming algorithm for asynchronous 
systems with known n and read-write shared memory with oblivious scheduling of events. Kutten et 
al. HZI provided a thorough study of naming in asynchronous systems of shared read-write memory. 
They gave a Las Vegas algorithm for an oblivious scheduler for the case of known n, which works in 
the expected time C>(logn) while using 0{n) shared registers, and also showed that a logarithmic 
time is required to assign names to anonymous processes. Additionally, they showed that if n is 
unknown then a Las Vegas naming algorithm does not exist, and a finite-state Las Vegas naming 
algorithm can work only for an oblivious scheduler. Panconesi et al. [Sl| gave a randomized wait- 
free naming algorithm for anonymous systems with processes prone to crashes that communicate by 
single-writer registers. The model considered in that work assigns unique single-writer registers to 
nameless processes and so has a potential to defy the impossibility of wait-free naming for general 
multi-writer registers, that impossibility proved by Kutten et al. m- Buhrman et al. [18] considered 
the relative complexity of naming and consensus problems in asynchronous systems with shared 
memory that are prone to crash failures, demonstrating that naming is harder than consensus. 

Next, we review work on problems in anonymous distributed systems different from naming. 
Aspnes et al. [7] gave a comparative study of anonymous distributed systems with different com¬ 
munication mechanisms, including broadcast and shared-memory objects of various functionalities, 
like read-write registers and counters. Alistarh et al. [3] gave randomized renaming algorithms 
that act like naming ones, in that process identifiers are not referred to; for more or renaming 
see [21191122]. Aspnes et al. [8] considered solving consensus in anonymous systems with infinitely 
many processes. Attiya et al. m and Jayanti and Toueg m studied the impact of initialization of 
shared registers on solvability of tasks like consensus and wakeup in fault-free anonymous systems. 
Bonnet et al. m considered solvability of consensus in anonymous systems with processes prone 
to crashes but augmented with failure detectors. Guerraoui and Ruppert m showed that certain 
tasks like time-stamping, snapshots and consensus have deterministic solutions in anonymous sys¬ 
tems with shared read-write registers and with processes prone to crashes. Ruppert m studied 
the impact of anonymity of processes on wait-free computing and mutual implementability of types 
of shared objects. 

The problem of concurrent communication in anonymous networks was first considered by 
Angluin [3]. That work showed, in particular, that randomization was needed in naming algorithms 
when executed in environments that are perfectly symmetric; other related impossibility results were 
surveyed by Fich and Ruppert [32] . 

The work about anonymous networks that followed was either on specific network topologies or 
on problems in general message-passing systems. Most popular specific topologies included that of 
a ring and hypercube. In particular, the ring topology was investigated by Attiya et al. [mils], 
Flocchini et al. [33], Diks et al. m, Itai and Rodeh [38], and Kranakis et al. [H], and the hypercube 
topology was studied by Kranakis and Krizanc [33] and Kranakis and Santoro |46| . 
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The work on algorithmic problems in anonymous networks of general topologies or anony¬ 
mous/named agents in anonymous/named networks included the following contributions. Afek 
and Matias [T] and Schieber and Snir m considered leader election, finding spanning trees and 
naming in general anonymous networks. Angluin et al. [5] studied adversarial communication by 
anonymous agents and Angluin et al. [B] considered self-stabilizing protocols for anonymous asyn¬ 
chronous agents deployed in a network of unknown size. Chalopin et al. |19] studied naming and 
leader election in asynchronous networks when a node knows the map of the network but its posi¬ 
tion on the map is unknown. Chlebus et al. |2U] considered assigning names to anonymous stations 
attached to a channel that allows only beeps to be heard. Chlebus et al. [21] investigated anony¬ 
mous complete networks whose links and nodes are subject to random independent failures in which 
single fault-free node has to wake up all nodes by propagating a wakeup message through the net¬ 
work. Dereniowski and Pelc |25] considered leader election among anonymous agents in anonymous 
networks. Dieudonne and Pec [26] studied teams of anonymous mobile agents in networks that exe¬ 
cute deterministic algorithm with the goal to convene at one node. Fraigniaud et al. |34j considered 
naming in anonymous networks with one node distinguished as leader. Gcj,sieniec et al. [35] inves¬ 
tigated anonymous agents pursuing the goal to meet at a node or edge of a ring. Glacet et al. [3B] 
considered leader election in anonymous trees. Kowalski and Malinowski |42j studied named agents 
meeting in anonymous networks. Kranakis et al. [45] investigated computing boolean functions on 
anonymous networks. Metivier et al. [SU] considered naming anonymous unknown graphs. Michail 
et al. m studied the problems of naming and counting nodes in dynamic anonymous networks. 
Pelc [55] considered activating an anonymous ad hoc radio network from a single source by a deter¬ 
ministic algorithm. Yamashita and Kameda [62] investigated topological properties of anonymous 
networks that are conducive to have deterministic solutions for representative algorithmic problems. 

General questions of computability in anonymous message-passing systems implemented in net¬ 
works were studied by Boldi and Vigna [16], Emek et al. [29], and Sakamoto [58] . 

Lower bounds on PRAM were given by Fich et al. m, Gook et al. [23], and Beame m , among 
others. A review of lower bounds based on information-theoretic approach is given by Attiya and 
Ellen |10j . Yao’s minimax principle was given by Yao |63j : the book by Motwani and Raghavan m 
gives examples of applications. 


2 Technical Preliminaries 

A distributed system with shared memory in which some n processors operate concurrently is the 
model of computation that we use in this paper. The essential properties of such systems that 
we assume are, first, that shared memory cells have only reading/writing capabilities, and, second, 
that operations of accessing shared registers are globally synchronized so that processors work in 
lockstep. 

An execution of a synchronous algorithm is structured as a sequence of rounds so that each 
processor performs either a read from a shared memory cell or a write to a shared memory cell in 
a round. We assume that a processor carries out its private computation in a round in a negligible 
portion of the round. An invocation of either reading from or writing to a memory location is 
completed in the round of invocation. This model of computation is referred to in the literature as 
the Parallel Random Access Machine (PRAM)] see |39l 156] . 

PRAM is usually defined as a model with unbounded number of shared-memory cells, by analogy 
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with the random-access machine (RAM) model for sequential computing [^. In this paper, we 
consider the following two instantiations of the PRAM model, determined by the amount of the 
available shared memory. In one situation, there is a constant number of shared memory cells, 
which is independent of the number of processors n but as large as needed in the specific algorithm. 
In the other case, the number of shared memory cells is unbounded in principle, but the expected 
number of shared registers accessed in an execution depends on n and is sought to be minimized. 

Each shared memory cell is assumed to be initialized to 0 as a default value. This assumption 
simplifies the exposition, but it is not crucial as any algorithm assuming such an initialization can 
be modified to work with dirty memory; for example, one can apply an approach similar to that 
in m- A shared memory cell can store any value as needed in algorithms, in particular, integers of 
magnitude that may depend on n; all our algorithms require a memory cell to store O(logn) bits. 
Processors can generate as many private random bits per round as needed; all these random bits 
generated in an execution are assumed to be independent. 

PRAM variants. Two operations are said to be performed concurrently when they are invoked 
in the same round of an execution of an algorithm on a PRAM. A concurrent read occurs when 
a group of processors read from the same memory cell in the same round; this results in each of 
these processors obtaining the value stored in the memory cell at the end of the preceding round. 
A concurrent write occurs when a group of processors invoke a write to the same memory cell in 
the same round. 

Without loss of generality, we may assume that a concurrent read of a memory cell and a 
concurrent write to the same memory cell do not occur simultaneously: this is because we could 
designate rounds only for reading and only for writing depending on their parity, thereby slowing 
the algorithm by a factor of two. 

The meaning of concurrent reading from the same memory cell is straightforward, in that all 
the readers get the value stored in this memory cell. We need to specify which value gets written 
to a memory cell in a concurrent write, when multiple distinct values are attempted to be written. 
Such stipulations determine the corresponding variants of the model. We will consider algorithms 
for the following two PRAM variants determined by their respective concurrent-write semantics. 

Common PRAM is defined by the property that when a group of processors want to write to the 
same shared memory cell in a round then all the values that any of the processors want to 
write must be identical, otherwise the operation is illegal. Concurrent attempts to write the 
same value to a memory cell result in this value getting written in this round. 

Arbitrary PRAM allows attempts to write any legitimate values to the same memory cell in the 
same round. When this occurs, then one of these values gets written, while a selection of this 
written value is arbitrary. All possible selections of values that get written need to be taken 
into account when arguing about correctness of an algorithm. 

We will rely on certain standard algorithms developed for PRAMs, as explained in |39l I56j . 
One of them is for prefix-type computations. A typical situation in which it is applied occurs when 
there is an array of m shared memory cells, each memory cell storing either 0 or 1. This may 
represent an array of bins where 1 stands for a nonempty bin while 0 for an empty bin. Let the 
rank of a nonempty bin of address x be the number of nonempty bins with addresses smaller than 
or equal to x. Ranks can be computed in time O(logm) by using an auxiliary memory of 0{m) 
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cells, assuming there is at least one processor assigned to each nonempty bin, while other processors 
do not participate; such assignment for anonymous processors will be determined by writes they 
performed to the bins. The underlying idea is that bins are associated with the leaves of a binary 
tree. The processors traverse a binary tree from the leaves to the root and back to the leaves. 
When updating information at a node, only the information stored at the parent, the sibling and 
the children is used. 

We may observe that the same memory can be used repeatedly when such computation needs 
to be performed multiple times on the same tree. A possible approach is to verify if the information 
at a needed memory cell, representing either a parent, a sibling or a child of a visited node, is fresh 
or rather stale from previous executions. This could be accomplished in the following three steps 
by a processor. First, the processor erases a memory cell it needs to read by rewriting its present 
value by a blank value. Second, the processor writes again the value at the node it is visiting, which 
may have been erased in the previous step by other processors that need the value. Finally, the 
processor reads again the memory cell it has just erased, to see if it stays erased, which means its 
contents were stale, or not, which in turn means its contents got rewritten so they are fresh. 

Balls into bins. In the course of probabilistic analysis of algorithms, we will often model actions 
of processors by throwing balls into bins. This can be done in two natural ways. One is such 
that memory addresses are interpreted as bins and the values written represent balls, possibly with 
labels. Then the total number of balls considered will always be n, that is, equal to the number 
of processors of a PRAM. Another possibility is when bins represent rounds and selecting a bin 
results in performing a write to a suitable shared register in the respective round. 

The following terms refer to the status of a bin in a given round. A bin is called empty when 
there are no balls in it. A ball is single in a bin when there are no other balls in the same bin, 
and such a bin can be called singleton. A bin is multiple when there are at least two balls in it. A 
eollision occurs in a multiple bin. Finally, a bin with at least one ball is occupied. 

The rank of a bin containing a ball is the number of bins with smaller or equal names that 
contain balls. When each processor, in a group of processors that still seek names, throws a ball 
and there is no collision then this breaks symmetry in a manner that in principle could facilitate 
assigning unique names to processors in the group, namely, the ranks of selected bins may serve as 
names. 

Throwing balls into bins will be performed repeatedly in each instance of modeling the behavior 
of an algorithm. Each instance of throwing a number of balls into bins is then called a stage. There 
will be an additional numeric parameter /3 > 0, and we call the process of throwing balls into bins 
the P-process, accordingly. This parameter fd may determine the number of bins in a stage and 
also when a stage is the last one in an execution of the /3-process. The specifications of any such a 
/3-process apply only within the section in which it is defined. 

In a given stage of a /3-process, the balls that are thrown into the bins are called eligible for the 
stage; all the n balls are eligible for the first stage. When a bin is selected for a ball to be placed in, 
then this occurs uniformly at random over the range of bins, and independently over the considered 
balls. Each selection of a bin for an eligible balls requires the number of random bits equal to the 
binary logarithm of the number of the available bins. 

When we sum up the numbers of available bins over all the stages of an execution of a /3-process 
until termination, then the result is the number of bins ever needed in this execution. Similarly, 
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Procedure Verify-Collision (x) 

initialize Heads[x] ■(— Tails[x] false 

toss^ ■(— outcome of tossing a fair coin 

if toss^ = tails 

then Tails[x] •(— true 
else Heads[x] •(— true 

return (Tails[x] = Heads[x]) 


Figure 1: A pseudocode for a processor x of a Common PRAM, where x is a positive 
integer. Heads and Tails are arrays of shared memory cells. When the parameter x 
is dropped in a call then this means that x = 1. The procedure returns true when a 
collision is detected. 


the number of bits ever generated in an execution of a /3-process is the sum of all the numbers of 
random bits needed to be generated to place balls, over all the stages and balls until termination 
of this execution. 

The idea of representing attempts to assign names as throwing balls into bins is quite generic. 
In particular, it was applied by Egecioglu and Singh [28], who proposed a synchronous algorithm 
that repeatedly throws all balls together into all available bins, the selections of bins for balls made 
independently and uniformly at random. In their algorithm for n processors, we can use 7 • n 
memory cells, where 7 > 1. Let us choose 7 = 8 for the following calculations to be specific. This 
algorithm has an exponential expected-time performance. To see this, we estimate the probability 
that each bin is either singleton or empty. Let the balls be thrown one by one. After the hrst n/2 
balls are in singleton bins, the probability to hit an empty bin is at most we treat this 

as a success in a Bernoulli trial. The probability of n/2 such successes is at most (|)"/^, so the 

expected time to wait for the algorithm to terminate is at least which is exponential in n. 

We consider related processes that could be as fast as O(logn) in expected time, while still 
using only 0{n) shared memory cells, see Section [T] The idea is to let balls in singleton bins stay 
put and only move those that collided with other balls by landing in bins that became thereby 
multiple. To implement this on a Common PRAM, we need a way to detect collisions, which we 
explain next. 

Verifying collisions. We will use a randomized procedure for Common PRAM to verify if a 
collision occurs in a bin, say, a bin x, which is executed by each processor that selected bin x. This 
procedure Verify-Collision is represented in Figured) There are two arrays Tails and Heads 
of shared memory cells. Bin x is verified by using memory cells Tails[x] and Heads[x]. First, the 
memory cells Tails[x] and Heads[x] are set each to false. Next, each processors selects randomly 
and independently one of these memory cells and sets it to true. Finally, every processor reads 
reads both Tails[x] and Heads[x] and detects a collision upon reading true twice. 





Lemma 1 For an integer x, procedure Verify-Collision (x) executed by one processor never 
detects a collision, and when multiple processors execute this procedure then a collision is detected 
with probability at least ^. 

Proof: When only one processor executes the procedure, then first the processor sets both Heads[x] 
and Tails [x] to false and next only one of them to true. This guarantees that Heads [x] and Tails [x] 
store different values and so collision is not detected. When some m > 1 processors execute the 
procedure, then collision is not detected only when either all processors set Heads [x] to true or all 
processors set Tails[x] to true. This means that the processors generate the same outcome in their 
coin tosses. This occurs with probability , which is at most □ 

Pseudocode conventions and notations. We specify algorithms using pseudocode conventions 
natural for the PRAM model. An example of such a representation is in FigurelH These conventions 
are summarized as follows. 

There are two kinds of variables: shared and private. The names of shared variables start with 
capital letters and the names of private ones are all in small letters. To emphasize that a private 
variable x is such that its value may depend on a processor u in a round, we may denote x by x^ 
in a pseudocode for v. 

When X is a private variable that may have different values at different processors at the same 
time, then we denote this variable used by a processor u by x^. Private variables that have the 
same value at the same time in all the processors are usually used without subscripts, like variables 
controlling for-loops. An assignment instruction of the form x-(— a, where 

x,y,... ,z are variables and a is a value, means to assign a as the value to be stored in all the listed 
variables x,y,... ,z. 

We want that, at any round of an execution, all the processors that have not terminated yet 
are executing the same line of the pseudocode. In particular, when an instruction is conditional 
on a statement then a processor that does not meet the condition pauses as long as it would be 
needed for all the processors that meet the condition complete their instructions, even when there 
are no such processors. If this is not a constant-time instruction, then it may incur an unnecessary 
time cost. To avoid this problem, we may hrst verify if there is some processor that satisfies the 
condition, which can be done in constant time. 

We use three notations for logarithms. The notation Igx stands for the logarithm of x to the 
base 2. The notation Inx denotes the natural logarithm of x. When the base of logarithms does 
not matter then we use logx, like in the asymptotic notation O(logx). 

Properties of naming algorithms. Naming algorithms in distributed environments involving 
multi-writer read-write shared memory have to be randomized to break symmetry mm- An 
eventual assignment of proper names cannot be a sure event, because, in principle, two processors 
can generate the same strings of random bits in the course of an execution. We say that an event 
is almost sure, or occurs almost surely, when it occurs with probability 1. When n processors 
generate their private strings of random bits then it is an almost sure event that all these strings 
are eventually pairwise distinct. Therefore, a most advantageous scenario that we could expect, 
when a set of n processors is to execute a randomized naming algorithm, is that the algorithm 
eventually terminates almost surely and that at the moment of termination the output is correct, 
in that the assigned names are without duplicates and fill the whole interval [l,n]. 
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Randomized naming algorithms are categorized as either Monte Carlo or Las Vegas, which are 
defined as follows. A randomized algorithm is Las Vegas when it terminates almost surely and 
the algorithm returns a correct output upon termination. A randomized algorithm is Monte Carlo 
when it terminates almost surely and an incorrect output may be produced upon termination, but 
the probability of error converges to zero with the size of input growing unbounded. 

We give algorithms that use the expected number of 0{nlogn) random bits with a large prob¬ 
ability. This amount of random information is necessary if an algorithm is to terminate almost 
surely. The following fact is essentially a folklore, but since we do not know if it was ever proved 
in the literature, we give a proof for completeness’ sake. Our arguments resort to the notions of 
information theory |24j . 

Proposition 1 If a randomized naming algorithm is correct with probability pn, when executed 
by n anonymous processors, then it requires fl(nlogn) random bits with probability at least pn- 
In particular, a Las Vegas naming algorithm for n processors uses fl(nlogn) random bits almost 
surely. 

Proof: Let us assign conceptual identifiers to the processors, for the sake of argument. These 
unknown identifiers are known only to an external observer and not to algorithms. The purpose of 
executing the algorithm is to assign explicit identifiers, which we call given names. 

Let a processor with an unknown identifier Ui generate a string of bits for i = 1,... ,n. A 
distribution of given names among the n anonymous processors, which results from executing the 
algorithm, is a random variable with a uniform distribution on the set of all permutations of the 
unknown identifiers. This is because of symmetry: all processors execute the same code, without 
explicit private names, and if we rearrange the strings generated bits bi among the processors Ui, 
then this results in the corresponding rearrangement of the given names. 

The underlying probability space consists of n! elementary events, each determined by an as¬ 
signment of the given names to the processors identified by the unknown identifiers. It follows that 
each of these events occurs with probability 1/n!. The Shannon entropy of the random variable 
is thus lg(u!) = 0(nlogn). The decision about which assignment of given names is produced is 
determined by the random bits, as they are the only source of entropy. It follows that the expected 
number of random bits used by the algorithm needs to be as large as the entropy of the random 
variable Xn. 

The property that all assigned names are distinct and in the interval [l,n] holds with prob¬ 
ability Pn. An execution needs to generate a total of n(nlogn) random bits with probability at 
least Pn, because of the bound on entropy. A Las Vegas algorithm terminates almost surely, and 
returns correct names upon termination. This means that Pn = 1 and so that fl(nlogn) random 
bits are used almost surely. □ 

A naming algorithm cannot be Las Vegas when n is unknown, as was observed by Kutten et 
al. @7] for asynchronous computations against an oblivious adversary. We show the analogous fact 
for synchronous computations. 

Proposition 2 There is no Las Vegas naming algorithm for a PRAM with n > 1 processors that 
does not refer to the number of processors n in its code. 
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Proof: Let us suppose, to arrive at a contradiction, that such a naming Las Vegas algorithm exists. 
Consider a system of n — 1 > 1 processors, and an execution E on these n — 1 processors that uses 
specific strings of random bits such that the algorithm terminates in E with these random bits. 
Such strings of random bits exist because the algorithm terminates almost surely. 

Let vi be a processor that halts latest in E among the n — 1 processors. Let as be the string 
of random bits generated by processor vi by the time it halts in E. Consider the execution E' on 
n > 2 processors such that n — 1 processors obtain the same strings of random bits as in E and an 
extra processor V 2 obtains as as its random bits. The executions E and E' are indistinguishable 
for the n — 1 processors participating in T, so they assign themselves the same names and halt. 
Processor performs the same reads and writes as processor vi and assigns itself the same name 
as processor vi does and halts in the same round as processor vi. This is the termination round 
because by that time all the other processor have halted as well. 

It follows that execution E' results in a name being duplicated. The probability of duplication 
for n processors is at least as large as the probability to generate two identical finite random strings 
in E' for some two processors, so this probability is positive. □ 

If n is unknown, then the restriction 0{n log n) on the number of random bits makes it inevitable 
that the probability of error is at least polynomially bounded from below, as we show next. 

Proposition 3 For unknown n, if a randomized naming algorithm is executed by n anonymous 
processors, then an execution is incorrect, in that duplicate names are assigned to distinct processors, 
with probability that is at least assuming that the algorithm uses O(nlogn) random bits with 

probability 1 — 

Proof: Suppose the algorithm uses at most cn Ig n random bits with probability pn when executed 
by a system of n processors, for some constant c > 0. Then one of these processors uses at most 
clgn bits with probability pn, by the pigeonhole principle. 

Consider an execution for n+1 processors. Let us distinguish a processor v. Consider the actions 
of the remaining n processors: one of them, say w, uses at most clgn bits with the probability pn- 
Processor v generates the same string of bits with probability = n~'^. The random bits 

generated by w and v are independent. Therefore duplicate names occur with probability at 
least n~^ ■ pn- When we have a bound on probabilities pn to be = 1 — then probability 

of occurrence of duplicate names is at least n~^{\ — = n-^(i). □ 

3 Lower Bounds on Running Time 

We consider two kinds of algorithmic naming problems, as determined by the amount of shared 
memory. One case is for a constant number of shared memory cells, for which we give an optimal 
lower bound on time for 0(1) shared memory. The other case is when the number of shared memory 
cells and their capacity are unbounded, for which we give an “absolute” lower bound on time. We 
begin with lower bounds that reflect the amount of shared memory. 

Intuitively, as processors generate random bits, these bits need to be made common knowledge 
through some implicit process that assigns explicit names. There is an underlying flow of infor¬ 
mation to spread knowledge among the processors through the available shared memory. Time is 
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bounded from below by the rate of flow of information and the total amount of bits that need to 
be shared. 

On the technical level, in order to bound the expected time of a randomized algorithm, we 
apply the Yao’s minimax principle [B3] to relate this expected time to the distributional expected 
time complexity. A randomized algorithm whose actions are determined by random bits can be 
considered as a probability distribution on deterministic algorithms. A deterministic algorithm has 
strings of bits given to processors as their inputs, with some probability distribution on such inputs. 
The expected time of such a deterministic algorithm, give any specific probability distribution on 
the inputs, is a lower bound on the expected time of a randomized algorithm. 

To make such interpretation of randomized algorithms possible, we consider strings of bits 
of equal length. With such a restriction on inputs, deterministic algorithm may not be able to 
assign proper names for some assignments of inputs, for example, when all the inputs are equal. 
We augment such deterministic algorithms by adding an option for the algorithm to withhold a 
decision on assignment of names and output “no name” for some processors. This is interpreted 
as the deterministic algorithm needing longer inputs, for which the given inputs are prefixes, and 
which for the randomized algorithm means that some processors need to generate more random 
bits. 

Regarding probability distributions for inputs of a given length, it will always be the uniform 
distribution. This is because we will use an assessment of the amount of entropy of such a distri¬ 
bution. 

Theorem 1 A randomized naming algorithm for a Common PRAM with n processors and C > 0 
shared memory cells operates in Ct{n\ogn/C) expected time when it is either a Las Vegas algorithm 
or a Monte Carlo algorithm with the probability of error smaller than 1/2. 

Proof: We consider Las Vegas algorithms in this argument, the Monte Carlo case is similar, the 
difference is in applying Yao’s principle for Monte Carlo algorithms. We interpret a randomized 
algorithm as a deterministic one working with all possible assignments of random bits as inputs 
with a uniform mass function on the inputs. The expected time of the deterministic algorithm is a 
lower bound on the expected time of the randomized algorithm. 

There are n\ possible assignments of given names to the processors. Each of them occurs with 
the same probability 1/n! when the input bit strings are assigned uniformly at random. Therefore 
the entropy of name assignments, interpreted as a random variable, is Ign! = n(nlogn). 

Next we consider executions of such a deterministic algorithm on the inputs with a uniform 
probability distribution. We may assume without loss of generality that an execution is structured 
into the following phases, each consisting of C -|- 1 rounds. In the hrst round of a phase, each 
processor either writes into a shared memory cell or pauses. In the following rounds of a phase, 
every processor learns the current values of each among the C memory cells. This may take C rounds 
for every processor to scan the whole shared memory, but we do not include this reading overhead 
as contributing to the lower bound. Instead, since this is a simulation anyway, we conservatively 
assume that the process of learning all the contents of shared memory cells at the end of a phase 
is instantaneous and complete. 

The Common variant of PRAM requires that if a memory cell is written into concurrently then 
there is a common value that gets written by all the writers. Such a value needs to be determined 
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by the code and the address of a memory cell. This means that, for each phase and any memory 
cell, a processor choosing to write into this memory cell knows the common value to be written. 
By the structure of execution, in which all processors read all the registers after a round of writing, 
any processor knows what value gets written into each available memory cell in a phase, if any is 
written into a particular cell. This implies that the contents written into shared memory cells may 
not convey any new information but are already implicit in the states of the processors represented 
by their private memories after reading the whole shared memory. 

When a processor reads all the shared memory cells in a phase, then the only new information 
it may learn is the addresses of memory cells into which new writes were performed and those into 
which there were no new writes. This makes it possible obtain at most C bits of information per 
phase, because each register was either written into or not. 

There are ff(nlogn) bits of information that need to be settled and one phase changes the 
entropy by at most C bits. It follows that the expected number of phases of the deterministic 
algorithm is r2(nlogn/C'). By the Yao’s principle, n(nlogn/C') is a lower bound on the expected 
time of a randomized algorithm. □ 

For Arbitrary PRAM, writing can spread information through the written values, because dif¬ 
ferent processes can attempt to write distinct strings of bits. The rate of flow of information is 
constrained by the fact that when multiple writers attempt to write to the same memory cell then 
only one of them succeeds, if the values written are distinct. This intuitively means that the size of a 
group of processors writing to the same register determines how much information the writers learn 
by subsequent reading. These intuitions are made formal in the proof of the following Theorem [2j 

Theorem 2 A randomized naming algorithm for an Arbitrary PRAM with n processors and C > 0 
shared memory cells operates in Pt[n/C) expected time when it is either a Las Vegas algorithm or 
a Monte Carlo algorithm with the probability of error smaller than 1/2. 

Proof: We consider Las Vegas algorithms in this argument, the Monte Carlo case is similar, 
the difference is in applying Yao’s principle for Monte Carlo algorithms. We again replace a given 
randomized algorithm by its deterministic version that works on assignments of strings of bits of the 
same length as inputs, with such inputs assigned uniformly at random to the processors. The goal 
is to use the property that the expected time of this deterministic algorithm, for a given probability 
distribution of inputs, is a lower bound on the expected time of the randomized algorithm. Next, 
we consider executions of this deterministic algorithm. 

Similarly as in the proof of Theorem [H we observe that there are n! assignments of given names 
to the processors and each of them occurs with the same probability 1 /re!, when the input bit strings 
are assigned uniformly at random. The entropy of name assignments is again Igre! = ll(relogre). 
The algorithm needs to make the processors learn n(relogre) bits using the available C > 0 shared 
memory cells. 

We may interpret an execution as structured into phases, such that each processor performs 
at most one write in a phase and then reads all the registers. The time of a phase is assumed 
conservatively to be 0(1). Consider a register and a group of processors that attempt to write their 
values into this register in a phase. The values attempted to be written are represented as strings 
of bits. If some of these values have 0 and some have 1 at some bit position among the strings, 
then this bit position may convey one bit of information. The maximum amount of information is 
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provided by a write when the written string of bits facilitates identifying the writer by comparing 
its written value to the other values attempted to be written concurrently to the same memory cell. 
This amount is at most the binary logarithm of the size of this group of processors. Therefore, each 
memory cell written to in a round contributes at most Ign bits of information, because there may 
be at most n writers to it. Since there are C registers, the maximum number of bits of information 
learnt by the processors in a phase is Clgn. 

The entropy of the assignment of names is Ign! = n(nlogn), so the expected number of phases 
of the deterministic algorithm is il(nlgn/(C'lgn)) = Q{n/C). By the Yao’s principle, this is also 
a lower bound on the expected time of a randomized algorithm. □ 

Next, we consider “absolute” requirements on time for a PRAM to assign unique names to the 
available n processors. The generality of the lower bound we give stems from the weakness of the 
assumptions. First, nothing is assumed about the knowledge of n. Second, concurrent writing is 
not constrained in any way. Third, shared memory cells are unbounded in their number and size. 

We show next in Theorem [3] that any Las Vegas naming algorithm has n(logn) expected time 
for the synchronous schedule of events. The argument we give is in the spirit of similar arguments 
applied by Cook et al. |23] and Beame |15] . In an analogous manner, Kutten et al. m showed 
that any Las Vegas naming algorithm for asynchronous read-write shared memory systems has the 
expected time n(logn) against a certain oblivious schedule. What these arguments share, along 
with the arguments we employ in this paper, are a formalization of the notion of flow of information 
during an execution of an algorithm, combined with a recursive estimate of the rate of this flow. 

The relation processor v knows processor w in round t is defined recursively as follows. First, 
for any processor v, we have that v knows v in any round t > 0. Second, if a processor v writes 
to a shared memory cell i? in a round ti and a processor w reads from i? in a round t 2 > ti, 
such that there was no other write into this memory cell after ti and prior to t 2 , then processor w 
knows in round t 2 each processor that v knows in round ti. Finally, the relation is the smallest 
transitive relation that satisfies the two postulates formulated above. This means that it is the 
smallest relation such that if processor v knows processor w in round ti and z knows v in round t 2 
such that t 2 > ti then processor 2 : knows w in round t 2 - In particular, the knowledge accumulates 
with time, in that if a processor v knows processor 2 : in round ti and round t 2 is such that t 2 > ti 
then V knows 2 ; in round t 2 as well. 

Lemma 2 Let A be a deterministic algorithm that assigns distinct names to the processors, with 
the possibility that some processors output “no name ” for some inputs, when each node has an input 
string of bits of the same length. When algorithm A terminates with proper names assigned to all 
the processors then each processor knows all the other processors. 

Proof: We may assume that n > 1 as otherwise one processors knows itself. Let us consider an 
assignment I of inputs that results in a proper assignment of distinct names to all the processors 
when algorithm A terminates. This implies that all the inputs in the assignment I are distinct 
strings of bits, as otherwise some two processors, say, v and w that obtain the same input string of 
bits would either assign themselves the same name or declare “no name” as output. 

Suppose that a processor v does not know a processor w, when v halts for inputs from I. 
Consider an assignment of inputs which is the same as X for processors different from w and such 
that the input of w is the same as input for v in X. Then the actions of processor v would be the 
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same with J as with X, because v is not affected by the input of u;, so that v would assign itself 
the same name with J as with X. But the actions of processor w would be the same in J as those 
of V, because their input strings of bits are identical under J. It follows that w would assign itself 
the name of u, resulting in duplicate names. This contradicts the assumption that all processors 
obtain unique names in the execution. □ 

We will use Lemma [2] to asses running times by estimating the number of interleaved reads and 
writes needed for processors to get to know all the processors. The rate of learning such information 
may depend on time, because we do not restrict the amount of shared memory, unlike in Theorems[T] 
and [21 Indeed, the rate may increase exponentially, under most liberal estimates. 

The following Theorem [3] holds for both Common and Arbitrary PRAMs. The argument used 
in the proof is general enough not to depend on any specific semantics of writing. 

Theorem 3 A randomized naming algorithm for a PRAM with n processors operates in fl(logn) 
expected time when it is either a Las Vegas algorithm or a Monte Carlo algorithm with the probability 
of error smaller than 1 / 2 . 

Proof: The argument is for a Las Vegas algorithm, the Monte Carlo case is similar. A randomized 
algorithm can be interpreted as a probability distribution on a finite set of deterministic algorithms. 
Such an interpretation works when input strings for a deterministic algorithm are of the same length. 
We consider all such possible lengths for deterministic algorithms, similarly as in the previous proofs 
of lower bounds. 

Let us consider a deterministic algorithm A, and let inputs be strings of bits of the same length. 
We may structure an execution of this algorithm A into phases as follows. A phase consists of 
two rounds. In the first round of a phase, each processor either writes to a shared memory cell or 
pauses. In the second round of a phase, each processor either reads from a shared memory cell or 
pauses. Such structuring can be done without loss of generality at the expense of slowing down an 
execution by a factor of at most 2 . Observe that the knowledge in the first round of a phase is the 
same as in the last round of the preceding phase. 

Phases are numbered by consecutively increasing integers, starting from 1 . A phase i comprised 
pairs of rounds {2z— 1,2i}, for integers f > 1. In particular, the first phase consists of rounds 1 and 2. 
We also add phase 0 that represents the knowledge before any reads or writes were performed. 

We show the following invariant, for i > 0: a processor knows at most 2* processors at the end 
of phase i. The proof of this invariant is by induction on i. 

The base case is for i = 0. The invariant follows from the fact that a processor knows only one 
processor in phase 0 , namely itself, and 2 ^ = 1 . 

To show the inductive step, suppose the invariant holds for a phase i > 0 and consider the next 
phase i + 1. A processor v may increase its knowledge by reading in the second round of phase i +1. 
Suppose the read is from a shared memory cell R. The latest write into this memory cell occurred 
by the first round of phase i + 1. This means that the processor w that wrote to R by phase i + 1, 
as the last one that did write, knew at most 2 * processors in the round of writing, by the inductive 
assumption and the fact that what is written in phase i + 1 was learnt by the immediately preceding 
phase i. Moreover, by the semantics of writing, the value written to i? by u) in that round removed 
any previous information stored in R. Processor v starts phase i + 1 knowing at most 2* processors. 
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and also learns of at most 2* other processors by reading in phase z +1, namely, those values known 
by the latest writer of the read contents. It follows that processor v knows at most 2* + 2* = 2*'’'^ 
processors by the end of phase i + 1. 

When proper names are assigned by such a deterministic algorithm, then each processor knows 
every other processor, by Lemma [2l A processor knows every other processor in a phase j such 
that 2^ > n, by the invariant just proved. Such a phase number j satisfies j > Ign, and it takes 
21 gn rounds to complete Ign phases. 

Let us consider inputs strings of bits assigned to processors uniformly at random. We need to 
estimate the expected running time of an algorithm A on such inputs. Let us observe that, in the 
context of interpreting deterministic executions for the sake to apply Yao’s principle, terminating 
executions of A that do not result in names assigned to all the processors could be pruned from a 
bound on their expected running time, because such executions are determined by bounded input 
strings of bits that a randomized algorithm would extend to make them sufficiently long to assign 
proper names. In other words, from the perspective of randomized algorithms, such prematurely 
ending executions do not represent real terminating ones. 

The expected time of A, conditional on terminating with proper names assigned, is therefore 
at least 2Ign. We conclude, by the Yao’s principle, that any randomized naming algorithm has 
Il(logn) expected runtime. □ 

The three lower bounds on time given in this Section may be applied in two ways. One is to 
infer optimality of time for a given amount of shared memory used. Another is to infer optimality 
of shared memory use given a time performance. This is summarized in the following Corollary [TJ 

Corollary 1 If the expected time of a naming Las Vegas algorithm is 0{n) on an Arbitrary PRAM 
with 0(1) shared memory, then this time performance is asymptotically optimal. If the expected time 
of a naming Las Vegas algorithm is O(nlogn) on a Common PRAM with 0(1) shared memory, then 
this time performance is asymptotically optimal. If a Las Vegas naming algorithm operates in time 
O(logn) on an Arbitrary PRAM using 0(n/ \ogn) shared memory cells, then this amount of shared 
memory is asymptotically optimal. If a Las Vegas naming algorithm operates in time O(logn) on a 
Common PRAM using 0(n) shared memory cells, then this amount of shared memory is optimal. 

Proof: We verify that the lower bounds match the assumed upper bounds. By Theorem [2l a Las 
Vegas algorithm operates almost surely in Ll(n) time on an Arbitrary PRAM when space is 0(1). By 
Theorem [U a Las Vegas algorithm operates almost surely in fl(nlogn) time on a Common PRAM 
when space is 0(1). By Theorem [2l a Las Vegas algorithm operates almost surely in 12 (log n) time 
on an Arbitrary PRAM when space is 0(n/logn). By Theorem [H a Las Vegas algorithm operates 
almost surely in 12(log n) time on a Common PRAM when space is 0(n). □ 


4 Las Vegas for Arbitrary with Bounded Memory 

We present a Las Vegas naming algorithm for an Arbitrary PRAM with a constant number of 
shared memory cells, in the case when the number of processors n is known. 

During an execution of this algorithm, processors repeatedly write random strings of bits rep¬ 
resenting integers to a shared memory cell called Pad, and next read Pad to verify the outcome of 
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Algorithm Arbitrary-Bounded-LV 


repeat 

initialize Counter name^ 0 
bin^ ■(— random integer in [l,n^] 
for i 1 to n do 

if name^ = 0 then 
Pad bin^, 
if Pad = bin^ then 

Counter ^ Counter + 1 
name^ Counter 

until Counter = n 


Figure 2: A pseudocode for a processor v of an Arbitrary PRAM, where the number 
of shared memory cells is a constant independent of n. The variables Counter and 
Pad are shared. The private variable name stores the acquired name. The constant 
/3 > 0 is parameter to be determined by analysis. 


writing. A processor v that reads the same value as it attempted to write increments the integer 
stored in a shared register Counter and uses the obtained number as a tentative name, which it 
stores in a private variable name^. The values of Counter could get incremented a total of less 
than n times, which occurs when some two processors chose the same random integer to write to 
the register Pad. The correctness of the assigned names is verified by the equality Counter = n, 
because Counter was initialized to zero. When such a verification fails then this results in another 
iteration of a series of writes to register Pad, otherwise the execution terminates and the value 
stored at name^ becomes the final name of processor v. 

This algorithm is called Arbitrary-Bounded-LV and its pseudocode is given in Figure [2j 
The pseudocode refers to a constant /3 > 0 which determines the bounded range [1, n^] from which 
processors select integers to write to the shared register Pad. 

Balls into bins. The selection of random integers in the range by n processors can be 

interpreted as throwing n balls into bins, which we call (5-process. A collision represents two 
processors assigning themselves the same name. Therefore an execution of the algorithm can be 
interpreted as performing such ball placements repeatedly until there is no collision. 

Lemma 3 For each a > 0 there exists (5 > 0 such that when n halls are thrown into bins during 
the f3-process then the probability of a collision is at most n““. 

Proof: Consider the balls thrown one by one. When a ball is thrown, then at most n bins are 
already occupied, so the probability of the ball ending in an occupied bin is at most njn^ = 
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No collisions occur with probability that is at least 


1 - 




-1 / — 


> 1 - 


n 


n 


/3-1 


= 1 — n 


-/3+2 


( 1 ) 


by the Bernoulli’s inequality. If we take (3 > a + 2 then just one iteration of the repeat-loop is 
sufficient with probability that is at least 1 — □ 

Next we summarize the performance of algorithm Arbitrary-Bounded-LV as a Las Vegas 
algorithm. 


Theorem 4 Algorithm Arbitrary-Bounded-LV terminates almost surely and there is no error 
when it terminates. For any a > 0, there exist f3 > 0 and c > 0 and such that the algorithm 
terminates within time cn using at most cnlnn random hits with probability at least 1 — n““. 


Proof: The algorithm assigns consecutive names from a continuous interval starting from 1, by 
the pseudocode in Figure [2j It terminates after n different tentative names have been assigned, by 
the condition controlling the repeat loop in the pseudocode of Figure [2l This means that proper 
names have been assigned when the algorithm terminates. 

We map an execution of the /3-process on an execution of algorithm Arbitrary-Bounded-LV 
in a natural manner. Under such an interpretation, Lemma [3] estimates the probability of the 
event that the n processors select different numbers in the interval [l,n^] as their values to write 
to Pad in one iteration of the repeat-loop. This implies that just one iteration of the repeat-loop 
is sufficient with the probability that is at least 1 — n““. The probability of the event that i 
iterations are not sufficient to terminate is at most which converges to 0 as i increases, so 

the algorithm terminates almost surely. One iteration of the repeat-loop takes 0{n) rounds and it 
requires O(nlogn) random bits. □ 

Algorithm Arbitrary-Bounded-LV is optimal among Las Vegas naming algorithms with 
respect to its expected running time 0{n), given the amount 0(1) of its available shared memory, 
by Corollary [1] in Section [3l and the expected number of random bits O(nlogn), by Proposition [1] 
in Section [5J 


5 Las Vegas for Arbitrary with Unbounded Memory 

In this section, we give a Las Vegas algorithm for an Arbitrary PRAM with an unbounded supply 
of shared memory cells, in the case when the number of processors n is known. This algorithm is 
called Arbitrary-Unbounded-LV and its pseudocode is given in Figure El 

The algorithm uses two arrays Bin and Counter of shared memory cells each. An execution 
proceeds by repeated attempts to assign names. During each such an attempt, the processors work 
to assign tentative names. Next, the number of distinct tentative names is obtained and if the count 
equals n then the tentative names become final, otherwise another attempt is made. We assume 
that each such an attempt uses a new segment of memory cells Counter initialized to Os, which is 
to simplify the exposition and analysis. An attempt to assign tentative names proceeds by each 
processor v selecting two integers bint, and labeU uniformly at random, where bint, S [1, and 
labelt, € [1, n^]. 
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Algorithm Arbitrary-Unbounded-LV 


repeat 

allocate Counter [1, /* array of fresh memory cells initialized to Os */ 

initialize position^, •(— (0,0) 

bin^, ^ a random integer in [1, j^] 

labelj, ■(— a random integer in 

repeat 

initialize All-Named true 
if position^, = (0,0) then 
Bin [bin^,] labels 
if Bin [bin^,] = labelj, then 

Counter [bin^,] Counter [bin^,] + 1 
position^ •(—(bin^. Counter [biny]) 
else All-Named false 

until All-Named /* each processor has a tentative name */ 

narne^ rank of position^ 

until n is the maximum name /* no duplicates among tentative names */ 


Figure 3: A pseudocode for a processor v of an Arbitrary PRAM, where the number 
of shared memory cells is unbounded. The variables Bin and Counter denote arrays 
of shared memory cells each, the variable All-Named is also shared. The private 
variable name stores the acquired name. The constant /3 > 0 is a parameter to be 
determined by analysis. 


Next the processors repeatedly attempt to write labels into Bin[bin„]. Each such a write 
is followed by a read and the lucky writer uses the value of memory register Counter[bint,] to 
create a pair of numbers (bin^,, Counter[bin„]), after first incrementing Counter[bin^], which is 
called bin^j’s position and is stored in variable position^,. After all processors have their positions 
determined, we define their ranks as follows. To find the rank of position^, we arrange all such 
pairs in lexicographic order, comparing first on bin and then on Counter [bin], and the rank is the 
position of this entry in the resulting list, where the first entry has position 1, the second 2, and so 
on. 

Ranks can be computed using a prefix-type algorithm operating in time 0(log n). This algorithm 
first finds for each bin G [Ij i]^] the sum s(bin) = Counter[z]. Next, each processor v 

with a position (bin„, c) assigns to itself s(bin^,) + c as its rank. After ranks have been computed, 
they are used as tentative names. 

In the analysis of algorithm Arbitrary-Unbounded-LV we will refer to the following bound 
on independent Bernoulli trials. Let Sn be the number of successes in n independent Bernoulli 
trials, with p as the probability of success. Let b{i;n,p) be the probability of an occurrence of 
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exactly i successes. For r > np, the following bound holds 

'Pr{Sn>r)<b{r-,n,p)-— —(2) 

r — np 

see Feller [30]. 

Balls into bins. We consider a process of throwing n balls into bins. Each ball has a label 
assigned randomly from the range [l,n^], for (5 > 0. We say that a labeled collision occurs when 
there are two balls with the same labels in the same bin. We refer to this process as the P-process. 

Lemma 4 For each o > 0 there exists /3 > 0 and c > 0 such that when n balls are labeled with 
random integers in [l,n^] and next are thrown into bins during the jd-process then there are at 
most clnn balls in every bin and no labeled collision occurs with probability 1 — n““. 


Proof: We estimate from above the probabilities of the events that there are more than clnn 
balls in some bin and that there is a labeled collision. We show that each of them can be made to 
be at most n““/ 2 , from which it follows that some of these two events occurs with probability at 
most n““. 

Let p denote the probability of selecting a specific bin when throwing a ball, which is p = 
When we set r = clnn, for a sufficiently large c > 1, then 


b{r; n,p) = 


n 

clnn 


Innylnn/ lnn\»^-clnn 

n / V n 


( 3 ) 


Formula ([3]) translates ([2]) into the following bound 
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The right-hand side of dH can be estimated by the following upper bound: 
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for each sufficiently large n > 0. This is because 
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which converges to 1. The probability that the number of balls in some bin is greater than clnn is 
therefore at most n • 7 ^-cinc-i-c-i _ ^-c(inc-i)^ union bound. This probability can be made 

smaller than n ““/2 for a sufficiently large c > e. 

The probability of a labeled collision is at most that of a collision when n balls are thrown 
into n^ bins. This probability is at most n“^+^ by bound ([1]) used in the proof of Lemma [3l This 
number can be made at most n ““/2 for a sufficiently large jd. □ 
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Next we summarize the performance of algorithm Arbitrary-Unbounded-LV as a Las Vegas 
algorithm. 

Theorem 5 Algorithm Arbitrary-Unbounded-LV terminates almost surely and there is no 
error when the algorithm terminates. For any a > 0, there exists /3 > 0 and c > 0 such that the 
algorithm assigns names within c In n time and generates at most cn In n random bits with probability 
at least 1 — n““. 

Proof: The algorithm terminates only when n different names have been assigned, which is provided 
by the condition that controls the main repeat-loop in Figure [3j This means that there is no error 
when the algorithm terminates. 

We map executions of the /3-process on executions of algorithm Arbitrary-Unbounded-LV 
in a natural manner. The main repeat-loop ends after an iteration in which each group of processors 
that select the same value for the variable bin, next select distinct values for the variable label. We 
interpret the random selections in an execution as throwing n balls into bins, where a number 
bin determines a bin. The number of iterations of the inner repeat-loop equals the maximum 
number of balls in a bin. 

For any a > 0, it follows that one iteration of the main repeat-loop suffices with probability 
at least 1 — n““, for a suitable ,8 > 0, by Lemma [H It follows that i iterations are executed by 
termination with probability at most so the algorithm terminates almost surely. 

Let us take c > 0 as in Lemma 01 It follows that an iteration of the main repeat-loop takes 
at most clnn steps and one processor uses at most clnn random bits in this one iteration with 
probability at least 1 — n““. □ 

Algorithm Arbitrary-Unbounded-LV is optimal among Las Vegas naming algorithms with 
respect to the following performance measures: the expected time O(logn), by Theorem [3l the 
number of shared memory cells 0{nj log re) used to achieve this running time, by Corollary [H both 
in Section[3l and the expected number of used random bits 0{n log re), by Proposition[I]in Section[2] 


6 Las Vegas for Common with Bounded Memory 

We consider the case of Common PRAM when the number of processors re is known and the 
number of available shared memory cells is constant. We propose a Las Vegas algorithm called 
Common-Bounded-LV, whose pseudocode is given in Figure 01 

An execution of the algorithm is organized as repeated “attempts” to assign temporary names. 
During such an attempt, each processor without a name chooses uniformly at random an integer 
in the interval [1, number-of-bins], where number-of-bins is a parameter initialized to re; such 
a selection is interpreted in a probabilistic analysis as throwing a ball into number-of-bins many 
bins. Next, for each i G [1, number-of-bins], the processors that selected i, if any, verify if they 
are unique in their selection of i by executing procedure Verify-Collision (given in Figured! in 
Section [2]) /31nre times, where /3 > 0 is a number that is determined in analysis. 

After no collision has been detected, a processor that selected i assigns itself a consecutive name 
by reading and incrementing the shared variable Last-Name. It takes up to /3 number-of-bins In re 
verifications for collisions for all integers in [1, number-of-bins]. When this is over, the value of 
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Algorithm COMMON-BOUNDED-LV 


repeat 

initialize number-of-bins n ; name„ Last-Name 0 ; 
no-collision^, true 

repeat 

initialize Collision-Detected-^ false 
if name^ = 0 then 

bin„ random integer in [1, number-of-bins] 
for i ^ 1 to number-of-bins do 
for j 1 to /31nn do 
if bin„ = i then 

if Verify-Collision then 

Collision-Detected-^ collision^ true 
if bin„ = i and not collision^ then 
Last-Name Last-Name + 1 
name^ Last-Name 
if n — Last-Name > /3 In n 

then number-of-bins (n — Last-Name) 
else number-of-binsn/(/3Inn) 
until not Collision-Detected 

until Last-Name = n 


Figure 4: A pseudocode for a processor r of a Common PRAM, where there is 
a constant number of shared memory cells. Procedure Verify-Collision has its 
pseudocode in Figure [U lack of parameter means the default parameter 1. The 
variables Collision-Detected and Last-Name are shared. The private variable name 
stores the acquired name. The constant /? is a parameter to be determined by analysis. 


variable number-of-bins is modified by decrementing it by the number of new names just assigned, 
when working with the last number-of-bins, unless such decrementing would result in a number 
in number-of-bins that is at most /31nn, in which case the variable number-of-bins is set to 
n/(/31nn). An attempt ends when all processors have tentative names assigned. 

These names become hnal when there are a total of n of them, otherwise there are duplicates, 
so another attempt is performed. The main repeat loop in the pseudocode in Figure |4| represents an 
attempt to assign tentative names to each processor. An iteration of the inner repeat loop during 
which number-of-bins > n/(/31nn) is called shrinking and otherwise it is called restored. 

Balls into bins. As a preparation for the analysis of performance of algorithm COMMON- 
Bounded-LV, we consider a related process of repeatedly throwing balls into bins, which we 
call the j3-process. The /3-process proceeds through stages, each representing one iteration of the 


22 





inner repeat-loop in Figured) A stage results in some balls removed and some transitioning to the 
next stage, so that eventually no balls remain and the process terminates. 

The balls that participate in a stage are called eligible for the stage. In the first stage, n balls 
are eligible and we throw n balls into n bins. Initially, we apply the principle that after all eligible 
balls have been placed into bins during a stage, the singleton bins along with the balls in them are 
removed. A stage after which bins are removed is called shrinking. There are k bins and k balls in 
a shrinking stage; we refer to k as the length of this stage. Given balls and bins for any stage, we 
choose a bin uniformly at random and independently for each ball in the beginning of a stage and 
next place the balls in their selected destinations. The bins that either are empty or multiple in a 
shrinking stage stay for the next stage. The balls from multiple bins become eligible for the next 
stage. 

This continues until such a shrinking stage after which at most (3Inn balls remain. Then we 
restore bins for a total of n/(/31nn)) of them to be used in the following stages, during which 
we never remove any bin; these stages are called restored. In these final restored stages, we keep 
removing singleton balls at the end of a stage, while balls from multiple bins stay as eligible for the 
next restored stage. This continues until all balls are removed. 

Lemma 5 For any a > 0, there exists /3 > 0 such that the sum of lengths of all shrinking stages in 
the fd-proeess is at most 2en, where e is the base of natural logarithms, and there are at most fdlnn 
restored stages, both events holding with probability 1 — n““, for sufficiently large n. 


Proof: We consider two cases depending on the kind of analyzed stages. Let k < n denote the 
length of a stage. 

In a shrinking stage, we throw k balls into k bins, while choosing bins independently and 
uniformly at random. The probability that a ball ends up singleton can be bounded from below as 
follows: 



1 - 


1 \ ^-1 




fc-i fc-i 
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e 

where we used the inequality 1 — x > , which holds for 0 < x < 

Let Zfc be the number of singleton balls after k balls are thrown into k bins. It follows that the 
expectancy of Zk satisfies E [Z^] > kje. 


To estimate the deviation of from its expected value, we use the bounded differences inequal¬ 
ity [491152] . Let Bj be the bin of ball bj, for 1 < j < A;. Then Z^ is of the form Z^ = h{Bi ,..., B^) 
where h satisfied the Lipschitz condition with constant 2, because moving one ball to a differ¬ 
ent bin results in changing the value of h by at most 2 with respect to the original value. The 
bounded-differences inequality specialized to this instance is as follows, for any d > 0: 


Pr(Zfc < E [Zfc] — dVk) < exp(—d^/S) . 


( 5 ) 
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We use this inequality for d = Then ([5]) implies the following bound: 
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If we start a shrinking stage with k eligible balls then the number of balls eligible for the next stage 
is at most 
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with probability at least 1—exp(—A:/32e^). Let us continue shrinking stages as long as the inequality 
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holds. We denote this inequality concisely as k > (3 Inn for j3 = 96e^a. Then the probability that 
every shrinking stage results in the size of the pool of eligible balls decreasing by a factor of at least 
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for sufficiently large n, by Bernoulli’s inequality. 

If all shrinking stages result in the size of the pool of eligible balls decreasing by a factor of at 
least 1//, then the total number of eligible balls summed over all such stages is at most 
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In a restored stage, there are at most /31nn eligible balls. A restored stage happens to be the 
last one when all the balls become single after their placement, which occurs with probability at 
least 

( nj (/3 Inn) — /3 Inn 
n/(/31nn) 

by the Bernoulli’s inequality. It follows that there are more than /3 In n restored stages with prob¬ 
ability at most 

This bound is at most n“^“ for sufficiently large n. 

Both events, one about shrinking stages and the other about restored stages, hold with proba¬ 
bility at least 1 — 2n“^“ > 1 — n““, for sufficiently large n. □ 

Next we summarize the performance of algorithm Common-Bounded-LV as Las Vegas one. 
In its proof, we rely on mapping executions of the /3-process on executions of algorithm COMMON- 
Bounded-LV in a natural manner. 
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Theorem 6 Algorithm Common-Bounded-LV terminates almost surely and there is no error 
when the algorithm terminates. For any a > 0 there exist /3 > 0 and c > 0 such that the algorithm 
terminates within time cnlnn using at most cnlnn random bits with probability 1 — n““. 

Proof: The condition controlling the main repeat-loop guarantees that an execution terminates 
only when the assigned names fill the interval [l,n], so they are distinct and there is no error. 

To analyze time performance, we consider the /3-process of throwing balls into bins as considered 
in Lemma O Let /3i > 0 be the number /3 specified in this Lemma, as determined by a replaced 
by 2a in its assumptions. This Lemma gives that the sum of all values of K summed over all 
shrinking stages is at most 2en with probability at least 1 — 

For a given K and a number i G procedure Verify-Collision is executed /31nn times, 

where /3 is the parameter in Figured If there is a collision then it is detected with probability at 
least We may take ^2 > /3i sufficiently large so that the inequality 2en • ^ 

holds. 

The total number of instances of executing Verify-Collision during an iteration of the main 
loop, while K is kept equal to n/(/3 In n), is at most n. Observe that the inequality n-2~^^ inn ^ 
holds with probability at most 1 — because n < 2en. 

If /3 is set in Figured] to /32 then one iteration of the outer repeat-loop suffices with probability at 
least 1 — 2n“^“, for sufficiently large n. This is because verifications for collisions detect all existing 
collisions with this probability. Similarly, this one iteration takes O(nlogre) time with probability 
that is at least 1 — 2n“^“, for sufficiently large n. The claimed performance holds therefore with 
probability at least 1 — n““, for sufficiently large n. 

There are at least i iterations of the main repeat-loop with probability at most so the 

algorithm terminates almost surely. □ 

Algorithm Common-Bounded-LV is optimal among Las Vegas algorithms with respect to 
the following performance measures: the expected time O(nlogn), given the amount 0(1} of its 
available shared memory, by Corollary [1] in Section [3l and the expected number of random bits 
O(relogn), by Proposition [1] in Section [2l 


7 Las Vegas for Common with Unbounded Memory 

We consider now the last case when the number of processors n is known. The PRAM is of its 
Common variant, and there is an unbounded amount of shared memory. We propose a Las Vegas 
algorithm called Common-Unbounded-LV, the pseudocode for this algorithm is given in Figure[5j 
Subroutines of prefix-type, like computing the number of selects and ranks of selected numbers are 
not included in this pseudocode. The algorithm invokes procedure Verify-Collision, whose 
pseudocode is in Figured! 

An execution of algorithm Common-Unbounded-LV proceeds as a sequence of attempts to 
assign temporary names. When such an attempt results in assigning temporary names without 
duplicates then these transient names become final. An attempt begins from each processor selecting 
an integer from the interval [1, (/3 -|- l)n] uniformly at random and independently, where /3 is a 
parameter such that only /3 > 1 is assumed. Next, for Ign steps, each process executes procedure 
Verify-Collision(x) where x is the currently selected integer. If a collision is detected then a 
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Algorithm Common-Unbounded-LV 


bin^ ■(— random integer in [1, (/3 + l)n] /* throw a ball into bin^, */ 

repeat 

for i 1 to Ign do 

if Verify-Collision (bin„) then 

bin^, ■<— random integer in [1, {(5 + l)n] 
number-occupied-bins total number of currently selected values for bin„ 

until number-occupied-bins = n 

name„ ^ the rank of bin^, among nonempty bins 


Figure 5: A pseudocode for a processor r of a Common PRAM, where the number of 
shared memory cells is unbounded. The constant /3 is a parameter that satisfies the 
inequality (5 > \. The private variable name stores the acquired name. 


processor immediately selects another number in [1, (/3 + l)re] and continues verifying for a collision. 
After Ig n such steps, the processors count the total number of selections of different integers. If this 
number equals exactly n then the ranks of the selected integers are assigned as names, otherwise 
another attempt to find names is made. Computing the number of selections and the ranks takes 
time O(logn). In order to amortize this time O(logn) by verifications, such a computation of 
ranks is performed only after Ign verifications. Here a rank of a selected x is the number of selected 
numbers that are at most x. 

Balls into bins. We consider auxiliary processes of placing balls into bins that abstracts opera¬ 
tions on shared memory as performed by algorithm Common-Unbounded-LV. 

The 13-process is about placing n balls into (/3 -|- l)n bins. The process is structured as a 
sequence of stages. A stage represents an abstraction of one iteration of the inner for-loop in 
Figure [5] performed as if collisions were detected instantaneously and with certainty. When a ball 
is moved then it is placed in a bin selected uniformly at random, all such selections independent 
from one another. The stages are performed as follows. In the hrst stage, n balls are placed into 
(/3 -|- l)n bins. When a bin is singleton in the beginning of a stage then the ball in the bin stays put 
through the stage. When a bin is multiple in the beginning of a stage, then all the balls in this bin 
participate actively in this stage: they are removed from the bin and placed in randomly-selected 
bins. The process terminates after a stage in which all balls reside in singleton bins. 

In analysis, it is convenient to visualize a stage as occurring by first removing all balls from 
multiple bins and then placing the removed balls in randomly selected bins one by one. We model 
placements of single balls by movements of a random walk. More precisely, we associate a mimicking 
walk to each execution of the /3-process. Such a walk is performed on points with integer coordinates 
on a line, as explained next in detail. 

The mimicking walk proceeds through stages, similarly as the ball process. When we are to 
relocate k balls in a stage of the ball process then this is represented by the mimicking walk 
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starting the corresponding stage at coordinate k. Suppose that we process a ball in a stage and 
the mimicking walk is at some position i. Placing this ball in an empty bin temporarily decreases 
the number of balls for the next stage; the respective action in the mimicking walk is to decrement 
its position from i to i — 1. Placing this ball in an occupied bin temporarily increases the number 
of balls for the next stage; the respective action in the mimicking walk is to increment its position 
from i to i + 1. The mimicking walk gives a conservative estimates on the behavior of the ball 
process, as we show next. 

Lemma 6 If a stage of the mimicking walk ends at a position k, then the corresponding stage of 
the ball jl-process ends with at most k balls to be relocated into bins in the next stage. 

Proof: The argument is broken into three cases, in which we consider what happens in the ball 
/3-process and what are the corresponding actions in the mimicking walk. A number of balls in a 
bin in a stage is meant to be the final number of balls in this bin at the end of the stage. 

In the first case, just one ball is placed in a bin that begins the stage as empty. Then this ball 
will not be relocated in the next stage. This means that the number of balls for the next stage 
decreases by 1. At the same time, the mimicking walk decrements its position by 1. 

In the second case, some j > 1 balls land in a bin that is singleton at the start of this stage, 
so this ball was not eligible for the stage. Then the number of balls in the bin becomes j + 1 
and these many balls will need to be relocated in the next stage. Observe that this contributes to 
incrementing the number of the eligible balls in the next stage by 1, because only the original ball 
residing in the singleton bin is added to the set of eligible balls, while the other balls participate in 
both stages. At the same time, the mimicking walk increments its position j times, by 1 each time. 

In the third and final case, some j > 2 balls land in a bin that is empty at the start of this 
stage. Then this does not contribute to a change in the number of balls eligible for relocation in 
the next stage, as these j balls participate in both stages. Let us consider these balls as placed in 
the bin one by one. The first ball makes the mimicking walk decrement its position, as the ball is 
single in the bin. The second ball makes the walk increment its position, so that it returns to the 
original position as at the start of the stage. The following ball placements, if any, result in the 
walk incrementing its positions. □ 

Random walks. Next we consider a random walk which will estimate the behavior of a ball 
process. One component of estimation is provided by Lemma El in that we will interpret a random 
walk as a mimicking walk for the ball process. 

The random walk is represented as movements of a marker placed on the non-negative side 
of the integer number line. The movements of the marker are by distance 1 at a time and they 
are independent from each other. The random fi-walk has the marker’s position incremented with 
probability and decremented with probability This may be interpreted as a sequence 

of independent Bernoulli trials, in which is chosen to be the probability of success. We will 
consider /3 > 1, for which > -g^, which means that the probability of success is greater than 
the probability of failure. 

Such a random ,0-walk proceeds through stages, which are defined as follows. The first stage 
begins at position n. When a stage begins at a position k then it ends after k moves, unless it 
reaches the zero coordinate in the meantime. The zero point acts as an absorbing barrier, and when 
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the walk’s position reaches it then the random walk terminates. This is the only way in which the 
walk terminates. A stage captures one round of PRAM’s computation and the number of moves in 
a stage represents the number of writes processors perform in a round. 


Lemma 7 For any numbers a > 0 and /3 > 1, there exists b > 0 such that the random /3-walk 
starting at position n > 0 terminates within 6 In re stages with all of them comprising 0{n) moves 
with probability at least 1 — re““. 


Proof: Suppose the random walk starts at position k > 0 when a stage begins. Let be the 
number of moves towards 0 and = k — be the number of moves away from 0 in such a stage. 
The total distance covered towards 0, which we call a drift, is 

L{k) = Xk-Yk = Xk-{k- Xk) = 2Xk-k . 


The expected value of Xk is E [Xk] = = fJ-k- The event Xk < {l—e)fik holds with probability 

at most exp(—by the Chernoff bound [52], so that Xk > (1 — £)Fk occurs with probability 
2 

at least 1 — exp(—We say that such a stage is conforming when the event Xk > {1 — s/fik 
holds. 

If a stage is conforming then the following inequality holds: 


L{k) > 2(1 


/3 + 1 /3 + 1 


We want the inequality ^ ^ > 0 to hold, which is the case when e < Let us fix such e > 0. 

Now the distance from 0 after k steps starting at k is 


k-L{k) 


(1 


/3 + 1 


■k 


2(l + /3g) 

/3 + 1 


• k 


where < 1 for e < Let p = 2 {i+pe) ^ Consecutive i conforming stages make the 

distance from 0 decrease by at least a factor 

When we start the first stage at position re and the next log^ re stages are conforming then after 
these many stages the random walk ends up at a position that is close to 0. For our purposes, 
it suffices that the position is of distance at most sin re from 0, for some s > 0, because of its 
impact on the probability estimates. Namely, the event that all these stages are conforming and 
the bound sin re on distance from 0 holds, occurs with probability at least 


1 — logp re • exp( 


/3 


2 P + 1 


s In re) > 1 — log. 


re • re 


2 P+l'- 


Let us choose s > 0 such that 

, -^-^s 1 

log, re • re 2/3+1 < - 

- 2re“ ’ 

for sufficiently large re. 

Having hxed s, let us take f > 0 such that the distance covered towards 0 is at least sin re when 
starting from k = fin re and performing k steps. We interpret these movements as if this was a 
single conceptual stage for the sake of the argument, but its duration comprises all stages when we 
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start from slnn until we terminate at 0. It follows that the conceptual stage comprises at most 
tlnn real stages, because a stage takes at least one round. 

If this last conceptual stage is conforming then the distance covered towards 0 is bounded by 

/3 +1 

We want this to be at least slnn for k = tlnn, which is equivalent to 

13-213s-I 



Now it is sufficient to take t > s ■ • This last conceptual stage is not conforming with 

2 R 

probability at most exp(—Inn). Let us take t that is additionally big enough for the following 
inequality 


(3 1 

exp(--- tlnn) = n ^ < - 

2 f3 + l ’ “ 2n° 


to hold. 

Having selected s and t, we can conclude that there are at most (s+t) Inn stages with probability 
at least 1 — n““. 


Now, let us consider only the total number of moves to the left Xm and to the right Ym after m 
moves in total, when starting at position n. The event Xm < (1 — e) • • m holds with probability 

at most exp(—Y • m), by the Chernoff bound [55], so that Xm > rn ■ occurs with at least 

2 n 

the respective high probability 1 — exp(—• m). At the same time, we have that the number 
of moves away from zero, which we denote Ym, can be estimated to be 


Ym = m — Xm < m — m- 


(1 -£)/3 
1 + /3 


1 + e/3 

T+J 


■ m . 


This gives an estimate on the corresponding drift: 

L{m) = Xm-Ym> ^ • 

We want the inequality > 0 to hold, which is the case when e < The drift is at 

least n, with the corresponding large probability, when m = d ■ n ioi d = . The drift is at 

least such with probability exponentially close to 1 in n, which is at least 1 — for sufficiently 
large re. □ 


Lemma 8 For any numbers a > 0 and /3 > 1, there exists b > 0 such that the (3-process starting 
at position re > 0 terminates within 6 In re stages after performing 0{n) ball throws with probability 
at least 1 — re““. 

Proof: We estimate the behavior of the /3-process on re balls by the behavior of the random /3-walk 
starting at position re. The justification of the estimation is in two steps. One is the property of 
mimicking walks given as Lemma [6| The other is provided by Lemma [7] and is justihed as follows. 
The probability of decrementing and incrementing position in the random /3-walk are such that 
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they reflect the probabilities of landing in an empty bin or in an occupied bin. Namely, we use 
the facts that during executing the /3-process, there are at most n occupied bins and at least /3 • n 
empty bins in any round. In the /3-process, the probability of landing in an empty bin is at least 
{lf+i)n ~ probability of landing in an occupied bin is at most This 

means that the random /3-walk is consistent with Lemma E] in providing estimates on the time of 
termination of the /3-process from above. □ 

Incorporating verifications. We consider the random /3-walk with verifications, which is defined 
as follows. The process proceeds through stages, similarly as the regular random /3-walk. For any 
round of the walk and a position at which the walk is at, we first perform a Bernoulli trial with 
the probability ^ of success. Such a trial is referred to as a verification, which is positive when a 
success occurs, otherwise it is negative. A positive verification results in a movement of the marker 
as in the regular /3-walk, otherwise the walk pauses at the given position for this round. 

Lemma 9 For any numbers a > 0 and /3 > 1, there exists b > 0 such that the random (3-walk with 
verifications starting at position n > 0 terminates within 6Inn stages with all of them eomprising 
the total of 0{n) moves with probability at least 1 — n““. 

Proof: We provide an extension of the proof of Lemma [71 which states a similar property of 
regular random /3-walks. That proof estimated times of stages and the number of moves. Suppose 
the regular random /3-walk starts at a position k, so that the stage takes k moves. There is a 
constant d < 1 such that the walk ends at a position at most dk with probability exponential in k. 

Moreover, the proof of Lemma[7]is such that all the values of k considered are at least logarithmic 
in n, which provides at most a polynomial bound on error. A random walk with verifications is 
slowed down by negative verifications. Observe that a random walk with verifications that is 
performed 3/c times undergoes at least k positive verifications with probability exponential in k by 
the Chernoff bound [STj • This means that the proof of Lemma [7] can be adapted to the case of 
random walks with verifications almost verbatim, with the modifications contributed by polynomial 
bounds on error of estimates of the number of positive verifications in stages. □ 

Next, we consider a (3-process with verifieations, which is defined as follows. The process pro¬ 
ceeds through stages, similarly as the regular ball process. The first stage starts with placing n 
balls into (/3 -|- l)n bins. For any following stage, we first go through multiple bins and, for each 
ball in such a bin, we perform a Bernoulli trial with the probability ^ of success, which we call a 
verification. A success in a trial is referred to as a positive verification otherwise it is a negative 
one. If at least one positive verification occurs for a ball in a multiple bin then all the balls in this 
bin are relocated in this stage to bins selected uniformly at random and independently for each 
such a ball, otherwise the balls stay put in this bin until the next stage. The /3-process terminates 
when all the balls are singleton. 

Lemma 10 For any numbers a > 0 and (3 > 1, there exists b > 0 such that the 13-process with 
verifications terminates within 6Inn stages with all of them comprising the total ofO{n) ball throws 
with probability at least 1 — n““. 

Proof: The argument proceeds by combining Lemma [6| with Lemma O similarly as the proof of 
Lemma [8| is proved by combining Lemma [6| with Lemma [71 The details follow. 
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For any execution of a ball process with verifications, we consider a “mimicking random walk,” 
also with verifications, defined such that when a ball from a multiple bin is handled then the outcome 
of a random verification for this ball is mapped on a verification for the corresponding random walk. 
Observe that for a /3-process with verifications just one positive verification is sufficient among j — 1 
trials when there are j > 1 balls in a multiple bin, so a random /3-walk with verifications provides 
an upper bound on time of termination of the /3-process with verifications. The probabilities of 
decrementing and incrementing position in the random /3-walk with verifications are such that they 
reflect the probabilities of landing in an empty bin or in an occupied bin, similarly as without 
verifications. All this give a consistency of a /3-walk with verifications with Lemma [6] in providing 
estimates on the time of termination of the /3-process from above. □ 

Next we summarize the performance of algorithm Common-Unbounded-LV as Las Vegas one. 
The proof is based on mapping executions of the /3-processes with verifications on executions of 
algorithm Common-Unbounded-LV in a natural manner. 

Theorem 7 Algorithm Common-Unbounded-LV terminates almost surely and when the algo¬ 
rithm terminates then there is no error. For each a > 0 and any (3 > 1 in the pseudocode, there 
exists c > 0 such that the algorithm assigns proper names within time clgn and using at most 
cnlgn random hits with probability at least 1 — n““. 

Proof: The algorithm terminates when there are n different ranks, by the condition controlling the 
repeat-loop. As ranks are distinct and each in the interval [l,u-], each name is unique, so there is 
no error. The repeat-loop is executed 0(1) times with probability at least 1 — n““, by Lemma fTUl 
The repeat-loop is performed i times with probability that is at most so it converges to 0 

with i increasing. It follows that the algorithm terminates almost surely. 

An iteration of the repeat-loop in Figure [5] takes 0(logn) steps. This is because of the following 
two facts. First, it consists of Ign iterations of the for-loop, each taking 0(1) rounds. Second, it 
concludes with verifying the until-condition, which is carried out by counting nonempty bins by a 
prefix-type computation. It follows that time until termination is O(logn) with probability I — 

By Lemma [TOl the total number of ball throws is 0(n) with probability 1 —n““. Each placement 
of a ball requires O(logn) random bits, so the number of used random bits is O(nlogn) with the 
same probability. □ 

Algorithm Common-Unbounded-LV is optimal among Las Vegas naming algorithms with 
respect to the following performance measures: the expected time O(logn), by Theorem O the 
number of shared memory cells 0(n) used to achieve this running time, by Corollary [H and the 
expected number of random bits 0(nlogn), by Proposition [TJ 


8 Monte Carlo for Arbitrary with Bounded Memory 

We develop a Monte Carlo naming algorithm for an Arbitrary PRAM with a constant number 
of shared memory cells, when the number of processors n is unknown. The algorithm is called 
Arbitrary-Bounded-MC and its pseudocode is given in Figure El 

The underlying idea is to have all processors repeatedly attempt to obtain tentative names and 
terminate when the probability of duplicate names is gauged to be sufficiently small. To this end, 
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Algorithm Arbitrary-Bounded-MC 


initialize /c •(— 1 /* initial approximation of Ig n */ 

repeat 

initialize Last-Name name„ ■(— 0 

k 2k 

bin^, random integer in [1,2*^] /* throw a ball into a bin */ 

repeat 

All-Named <r- true 
if name^ = 0 then 
Pad bin^, 
if Pad = bin„ then 

Last-Name Last-Name +1 
name„ ^ Last-Name 
else 

All-Named <r- false 
until All-Named 

until Last-Name < 2^/^ 


Figure 6: A pseudocode for a processor v of an Arbitrary PRAM with a constant 
number of shared memory cells. The variables Last-Name, All-Named and Pad are 
shared. The private variable name stores the acquired name. The constant /3 > 0 is a 
parameter to be determined by analysis. 


each processor writes an integer selected from a suitable “selection range” into a shared memory 
register and next reads this register to verify whether the write was successful or not. A successful 
write results in each such a processor getting a tentative name by reading and incrementing another 
shared register operating as a counter. One of the challenges here is to determine a selection range 
from which random integers are chosen for writing. A good selection range is large enough with 
respect to the number of writers, which is unknown, because when the range is too small then 
multiple processors may select the same integer and so all of them get the same tentative name 
after this integer gets written successfully. The algorithm keeps the size of a selection range growing 
with each failed attempt to assign tentative names. 

There is an inherent tradeoff here, since on the one hand, we want to keep the size of used 
shared memory small, as a measure of efficiency of the algorithm, while, at the same time, the 
larger the range of memory the smaller the probability of collision of random selections from a 
selection range and so of the resulting duplicate names. Additionally, increasing the selection range 
repeatedly costs time for each such a repetition, while we also want to minimize the running time 
as the metric of performance. The algorithm keeps increasing the selection range with a quadratic 
rate, which turns out to be sufficient to optimize all the performance metrics we measure. The 
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algorithm terminates when the number of selected integers from the current selection range makes 
a sufficiently small fraction of the size of the used range. 

The structure of the pseudocode in Figure [6] is determined by the main repeat-loop. Each 
iteration of this loop begins with doubling the variable k, which determines the selection range 
[1,2^]. This means that the size of the selection range increases quadratically with consecutive 
iterations of the main repeat-loop. A processor begins an iteration of the main loop by choosing an 
integer uniformly at random from the current selection range [1, 2^]. There is an inner repeat-loop, 
nested within the main loop, which assigns tentative names depending on the random selections 
just made. 

All processors repeatedly write to a shared variable Pad and next read to verify if the write 
was successful. It is possible that different processors attempt to write the same value and then 
verify that their write was successful. The shared variable Last-Name is used to proceed through 
consecutive integers to provide tentative names to be assigned to the latest successful writers. When 
multiple processors attempt to write the same value to Pad and it gets written successfully, then all 
of them obtain the same tentative name. The variable Last-Name, at the end of each iteration of 
the inner repeat-loop, equals the number of occupied bins. The shared variable All-Named is used 
to verify if all processors have tentative names. The outer loop terminates when the number of 
assigned names, which is the same as the number of occupied bins, is smaller than or equal to 2^^^, 
where /3 > 0 is a parameter to be determined in analysis. 

Balls into bins. We consider the following auxiliary ^-process of throwing balls into bins, for a 
parameter /3 > 0. The process proceeds through stages identified by consecutive positive integers. 
The ith stage has the number parameter k equal to A; = 2* . During a stage, we first throw n balls 
into the corresponding 2^ bins and next count the number of occupied bins. A stage is last in an 
execution of the /3-process, and so the /3-process terminates, when the number of occupied bins is 
smaller than or equal to 2^/^. 

We may observe that the /3-process always terminates. This is because, by its specification, the 
/3-process terminates by the first stage in which the inequality n < 2^/^ holds, where n is an upper 
bound on the number of occupied bins in a stage. The inequality n < 2^/^ is equivalent to < 2^ 
and so to /31gre < A:. Since k goes through consecutive powers of 2, we obtain that the number of 
stages of the /3-process with n balls is at most lg(/3 \gn) = \gj3 + IgIgre. 

We say that such a /3-process is correct when upon termination each ball is in a separate bin, 
otherwise the process is incorrect. 

Lemma 11 For any a > 0 there exists /3 > 0 such that the {3-process is incorrect with probability 
that is at most re““, for sufficiently large re. 

Proof: The /3-process is incorrect when there are collisions after the last stage. The probability 
of the intersection of the events “/3-process terminates” and “there are collisions” is bounded from 
above by the probability of each of these two events. Next we show that, for each pair of k and re, 
some of these two events occurs with probability that is at most re““, for a suitable /3. 

First, we consider the event that the /3-process terminates. The probability that there are at 
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most 2^/^ occupied bins is at most 


\2’^/f^)\2k) - V2^//3y 




( 6 ) 


We estimate from above the natural logarithm of the right-hand side of ([6]) . We obtain the following 
upper bound: 

2'=/^ + A:(/3-^ - l)(n-2^/^)ln2 < 2^=/^ - ^(n - 2^/^) ln2 
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for /3 > 4/3, as A: > 2. The estimate d?]) is at most —n • ^ when 2^/^ < n • (5, for (5 = 2 ( 2 +in 2 ) ’ 
a direct algebraic verification. These restrictions on k and /3 can be restated as 

k < /31g(n5) and /? > 4/3 . (8) 

When this condition ([8|) is satished, then the probability of at most 2^/^ occupied bins is at most 

In 2' 


exp^—n • I < n 

for sufficiently large n. 

Next, let us consider the probability of collisions occurring. Collisions do not occur with prob¬ 
ability that is at least 

1-^r >1-”' 


2k J - 2^ ’ 

by the Bernoulli’s inequality. It follows that the probability of collisions occurring can be bounded 
2 

from above by This bonnd in turn is at most when 


k > {2 + a) Ign 


(9) 


In order to have some of the inequalities ([8]) and ([9|) hold for any k and n, it is sufficient to have 

(2 -b o) Ig n < ,0 lg(?^(5) . 

This determines /? as follows: 

Ig n -b Ig 0 

with n —>• oo. We obtain that the inequality (5 > 2 -\- a suffices, for n that is large enough. □ 


Lemma 12 For each /3 > 0 there exists c > 0 such that when the (5-process terminates then the 
number of bins ever needed is at most cn and the number of random bits ever generated is at most 
cn Inn. 
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Proof: The /3-process terminates by the stage in which the inequality n < 2^^^ holds, so k gets to 
be at most /31gn. We partition the range [2,/31gn] of values of k into two subranges and consider 
them separately. 

First, when k ranges from 2 to Ig n through the stages, then the numbers of needed bins increase 
quadratically through the stages, because k is doubled with each transition to the next stage. This 
means that the total number of all these bins is 0{n). At the same time, the number of random 
bits increases geometrically through the stages, so the total number of random bits a processor uses 
is C>(logn). 

Second, when k ranges from Ig n to /3 Ig n, the number of needed bins is at most n in each stage. 
There are only lg(/3 -|- 1) such stages, so the total number of all these bins is lg(/3 -t- 1) • n. At the 
same time, a processor uses at most /3 Ign random bits in each of these stages. □ 

There is a direct correspondence between iterations of the outer repeat-loop and stages of a 
/3-process. The ith stage has the number k equal to the value of k during the ith iteration of 
the outer repeat-loop of algorithm Arbitrary-Bounded-MC, that is, we have k = 2h We map 
an execution of the algorithm into a corresponding execution of a /3-process in order to apply 
Lemmas [11] and [12] in the proof of the Theorem [8] which summarizes the performance of algorithm 
Arbitrary-Bounded-MC and justifies that it is Monte Carlo. 

Theorem 8 Algorithm Arbitrary-Bounded-MC always terminates, for any /3 > 0. For each 
a > 0 there exists /3 > 0 and c > 0 such that the algorithm assigns unique names, works in time at 
most cn, and uses at most cnlnn random bits, all this with probability at least 1 — n““. 

Proof: The number of stages of the /3-process with n balls is at most lg(/31gn) = lg/3 -|- Iglgn. 
This is also an upper bound on the number of iterations of the main repeat-loop. We conclude that 
the algorithm always terminates. 

The number of bins available in a stage is an upper bound on the number of bins occupied in this 
stage. The number of bins occupied in a stage equals the number of times the inner repeat-loop is 
iterated, because executing instruction Pad bin eliminates one occupied bin. It follows that the 
number of bins ever needed is an upper bound on time of the algorithm. The number of iterations 
of the inner repeat-loop is recorded in the variable Last-Name, so the termination condition of the 
algorithm corresponds to the termination condition of the /3-process. 

When the /3-process is correct then this means that the processors obtain distinct names. We 
conclude that Lemmas m and [12] apply when understood about the behavior of the algorithm. 
This implies the following: the names are correct and execution terminates in 0{n) time while 
O(nlogn) bits are used, all this with probability that is at least 1 — n““. □ 

Algorithm Arbitrary-Bounded-MC is optimal with respect to the following performance 
measures: the expected time 0{n), by Theorem]!] the expected number of random bits O(nlogn), 
by Proposition [T] and the probability of error by Proposition [3] 


9 Monte Carlo for Arbitrary with Unbounded Memory 

We develop a Monte Carlo naming algorithm for Arbitrary PRAM with an unbounded amount of 
shared registers, when the number of processors n is unknown. The algorithm is called Arbitrary- 


35 


Algorithm Arbitrary-Unbounded-MC 


initialize /c •(— 1 /* initial approximation of Ig n */ 

repeat 

initialize All-Named true 
initialize position^, ■(— (0,0) 
k •(— r{k) 

bin^ •(— random integer in [1,2^/{/3k)] /* choose a bin for the ball */ 

labels random integer in [1,2^^] /* choose a label for the ball */ 

for i 1 to /3A; do 

if position^ = (0,0) then 
Pad [bin^,] ■(— labels 
if Pad [bin^,] = labels then 

Last-Name [bin„] Last-Name [bin„] + 1 
position^ ■(—(bin„, Last-Name [bin„]) 
if position^ = (0,0) then 
All-Named <r- false 

until All-Named 

name^ ^ the rank of position^ 


Figure 7: A pseudocode for a processor v of an Arbitrary PRAM, when the number 
of shared memory cells is unbounded. The variables Pad and Last-Name are arrays of 
shared memory cells, the variable All-Named is shared as well. The private variable 
name stores the acquired name. The constant /3 > 0 and an increasing function r{k) 
are parameters. 


Unbounded-MC and its pseudocode is given in Figure [71 

The underlying idea is to parallelize the process of selection of names applied in Section [8] in 
algorithm Arbitrary-Bounded-MC so that multiple processes could acquire information in the 
same round that later would allow them to obtain names. As algorithm Arbitrary-Bounded- 
MC used shared registers Pad and Last-Name, the new algorithm uses arrays of shared registers 
playing similar roles. The values read-off from Last-Name cannot be uses directly as names, because 
multiple processors can read the same values, so we need to distinguish between these values to 
assign names. To this end, we assign ranks to processors based on their lexicographic ordering by 
pairs of numbers determined by Pad and Last-Name. 

The pseudocode in Figure [7| is structured as a repeat-loop. In the first iteration, the parame¬ 
ter k equals 1, and in subsequent ones is determined by iterations of the increasing integer-valued 
function r{k), which is a parameter. We consider two instantiations of the algorithm, determined 
by r{k) = k + \ and by r{k) = 2k. In one iteration of the main repeat-loop, a processor uses two 
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variables bin G [1, 2^/(/3A:)] and label G [1,2^^], which are selected independently and uniformly 
at random from the respective ranges. 

We interpret bin as a bin’s number and label as a label for a ball. Processors write their 
values label into the respective bin by instruction Pad [bin] ■(— label and verify what value got 
written. After a successful write, a processor increments Last-Name [bin] and assigns the pair 
(bin, Last-Name [bin]) as its position. This is repeated I3k times by way of iterating the inner 
for-loop. This loop has a specific upper bound 13k on the number of iterations because we want 
to ascertain that there are at most /3k balls in each bin. The main repeat-loop terminates when 
all values attempted to be written actually get written. Then processors assign themselves names 
according to the ranks of their positions. The array Last-Name is assumed to be initialized to O’s, 
and in each iteration of the repeat-loop we use a fresh region of shared memory to allocate this 
array. 

Balls into bins. We consider a related process of placing labeled balls into bins, which is referred 
to as 13-process. Such a process proceeds through stages and is parametrized by a function r{k). In 
the first stage, we have A; = 1, and given some value of A; in a stage, the next stage has this parameter 
equal to r{k). In a stage with a given k, we place n balls into 2^/[13k) bins, with labels from [1, 2^^\. 
The selections of bins and labels are performed independently and uniformly at random. A stage 
terminates the /^-process when there are at most j3k labels of balls in each bin. 

Lemma 13 The 13-process always terminates. 

Proof: The ^S-process terminates by a stage in which the inequality n < /3k holds, because n is an 
upper bound on the number of balls in a bin. This always occurs when function r[k) is increasing. 
□ 


We expect the /3-process to terminate earlier, as Lemma [TTl states. 

Lemma 14 For each a > 0, if k < Ign — 2 and /3 > 1 -|- a then the probability of halting in the 
stage is smaller than for sufficiently large n. 


Proof: We show that when k is suitably small then the probability of at most /3A: different labels 
in each bin is small. There are n balls placed into 2^/[(3k) bins, so there are at least balls in 
some bin, by the pigeonhole principle. We consider these balls and their labels. 

The probability that all these balls have at most j3k labels is at most 


/2/3fc\ 
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We want to show that this is at most n “. We compare the binary logarithms of n “ and the 
right-hand side of (fTO]) . and want the following inequality to hold: 


Ige- /3k + - (31^ (lg(/3A:) - (3k) <-a\gn , 
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which is equivalent to the following inequality, by algebra: 


n Ig e a Ig n 

¥- I3k- IgiPk) ^ pkiPk - IgiPk)) ' 


(11) 


Observe now that, assuming (3 > a + 1, if k < -y/lgn then the right-hand side of (jlip is at most 
0(l)-|-lgn while the left-hand side is at least and when ^J\g n < k <\gn — 2 then the right-hand 
side of dm) is at most 3 while the left-hand side is at least 4, for sufficiently large n. □ 

We say that a label collision occurs, in a conhguration produced by the process, if some bin 
contains two balls with the same label. 


Lemma 15 For any a > 0, i/ A: > 1 Ign and (3 > Aa + 7 then the probability of a label collision is 
smaller than n““. 


Proof: The number of pairs of a bin number and a label is 2^ • 2^^/(/3/c). It follows that the 
probability of some two balls in the same bin obtaining different labels is at least 


1 - 
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2^+hk/{l3k) 
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2^+P^/[j3k) 


by the Bernoulli’s inequality. So the probability that two different balls obtain the same label is at 
2 

most • We want the following inequality to hold 


2fc+/3fc/(/3fc) 


This is equivalent to the inequality obtained by taking logarithms 


(2 -b a) Ign < (1 -b I3)k - lg{(3k) , 

which holds when (2 -b a) Ign < ^-^k. It follows that it is sufficient for k to satisfy 

, 2(2 + a) , 


This inequality holds for k > ^Ign when /3 > 4a -b 7. □ 

We say that such a /3-process is correct when upon termination no label collision occurs, other¬ 
wise the process is incorrect. 


Lemma 16 For any o > 0, there exists (3 > 0 such that the ^-process is incorrect with probability 
that is at most for sufficiently large n. 

Proof: The /3-process is incorrect when there is a label collision after the last stage. The probability 
of the intersection of the events “/3-process terminates” and “there are label collisions” is bounded 
from above by the probability of any one of these events. Next we show that, for each pair of k 
and n, some of these two events occurs with probability that is at most for a suitable j3. 

To this end we use Lemmas M and [15] in which we substitute 2a for a. We obtain that, on 
the one hand, if A: < Ign — 2 and (3 > 1 -b 2a then the probability of halting is smaller than n“^“, 
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and, on the other hand, that if A: > ^ Ig n and /3 > 8o + 7 then the probability of a label collision 
is smaller than It follows that some of the two considered events occurs with probability at 

most 2n“^“ for sufficiently large /3 and any sufficiently large n. This probability is at most n““, for 
sufficiently large n. □ 

Lemma 17 For any a > 0, there exists /? > 0 and c > 0 such that the following two facts about the 
jd-process hold. If r{k) = k + l then at most cn/Inn bins are ever needed and cnln^n random bits 
are ever generated, each among these properties occurring with probability that is at least 1 — n““. 
If r{k) = 2k then at most cn^/lnn bins are ever needed and cnlnn random bits are ever generated, 
each among these properties occurring with probability that is at least 1 — n““. 

Proof: We throw n balls into 2^/{f3k) bins. As k keeps increasing, the probability of termination 
increases as well, because both 2^/{fik) and fik increase as functions of k. Let us take k = 1 + Ign 
so that the number of bins is We want to show that no bin contains more than (Ik balls with 
a suitably small probability. 

Let us consider a specific bin and let X be the number of balls in this bin. The expected number 
of balls in the bin is = ^. We use the Chernoff bound for a sequence of Bernoulli trials in the 
form of 

Pr(A > (1 + e)fi) < exp(—e^/i/3) , 

which holds for 0 < e < 1, see [52]. Let us choose e = i, so that 1 + e = | and = |/3A:. We 
obtain the following bound 

Pr(X > (Ik) < Pr(X >j-(Ik)< exp(-^ • = exp(-^ • (1 + lgn)) , 

which can be made smaller than for a (I sufficiently large with respect to a, and sufficiently 

large n. Using the union bound, each of the n bins contains at most (Ik balls with probability at 
most n““. This implies that termination occurs as soon as k reaches or surpasses A; = 1 + lgn, with 
the corresponding large probability 1 — n““. 

In the case of r{k) = k + 1, the consecutive integer values of k are tried, so the /3-process 
terminates by the time A; = 1 + Igre, and for this k the number of bins needed is 0(n/logn). To 
choose a bin for any value of k requires at most k random bits, so implementing such choices for 
k = 1, 2,... , 1 + Ign requires 0(log^ n) random bits per processor. 

In the case of r(k) = 2k, the /3-process terminates by the time the magnitude of k reaches 
2(1 -|- Ign), and for this value of k the number of bins needed is 0(n^/logn). As k progresses 
through consecutive powers of 2, the sum of these numbers is a sum of a geometric progression, 
and so is of the order of the maximum term, that is 0(logn), which is the number of random bits 
per processor. □ 

There is a direct correspondence between iterations of the outer repeat-loop of algorithm 
Arbitrary-Unbounded-MC and stages of the /3-process. We map an execution of the algorithm 
into a corresponding execution of a /3-process in order to apply LemmasIlblandfTTlin the proof of the 
following Theorem, which summarizes the performance of algorithm Arbitrary-Unbounded-MC 
and justifies that it is Monte Carlo. 
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Theorem 9 Algorithm Arbitrary-Unbounded-MC always terminates, for any /3 > 0. For 
each a > 0, there exists /3 > 0 and c > 0 such that the algorithm assigns unique names and has 
the following additional properties with probability 1 — n““. If r{k) = k + 1 then at most cn/lnn 
memory cells are ever needed, cnln^ n random bits are ever generated, and the algorithm terminates 
in time 0(log^ n). If r[k) = 2k then at most cv? j In n memory cells are ever needed, cn In n random 
bits are ever generated, and the algorithm terminates in time O(logn). 

Proof: The algorithm always terminates by Lemma [131 By Lemma [T6l the algorithm assigns 
correct names with probability that is at least 1 — n““. The remaining properties follow from 
Lemma dZl because the number of bins is proportional to the number of memory cells and the 
number of random bits per processor is proportional to time. □ 

The instantiations of algorithm Arbitrary-Unbounded-MC are close to optimality with re¬ 
spect to some of the performance metrics we consider, depending on whether r{k) = k + 1 or 
r{k) = 2k. If r{k) = k + 1 then the algorithm’s use of shared memory would be optimal if its 
time were O(logn), by Theorem [2l but as it is, the algorithm misses space optimality by at most a 
logarithmic factor, since the algorithm’s running time is O(log^ra). Similarly, if r{k) = k + 1 then 
the number of random bits ever generated O(nlog^n) misses optimality by at most a logarithmic 
factor, by Proposition [H On the pother hand, if r{k) = 2k then the expected time O(logn) is op¬ 
timal, by Theorem 121 the expected number of random bits O(nlogn) is optimal, by Proposition [H 
and the probability of error is optimal, by Proposition [3l but the amount of used shared 

memory misses optimality by at most a polynomial factor, by Theorem [2l 


10 Monte Carlo for Common with Bounded Memory 

The Monte Carlo algorithm Common-Bounded-MC, which we present in this section, solves the 
naming problem for a Common PRAM with a constant number of shared read-write registers, 
when the number of processors n is unknown. The algorithm has its pseudocode in Figure [101 To 
make the exposition of this algorithm more modular, we use two procedures Estimate-Size and 
Extend-Names. The pseudocodes of these procedures are given in Figures El and El respectively. 
The private variables in the pseudocode in Figure EOl have the following meaning: size is an 
approximation of the number of processors n, and number-of-bins determines the size of the 
range of bins we throw conceptual balls into. 

The main task of procedure Estimate-Size is to produce an estimate of the number n of 
processors. Procedure Extend-Names is iterated multiple times, each iteration is intended to 
assign names to a group of processors. This is accomplished by the processors selecting integer 
values at random, interpreted as throwing balls into bins, and verifying for collisions. Each selection 
of a bin is followed by a collision detection. A ball placement without a detected collision results 
in a name assigned, otherwise the involved processors try again to throw balls into a range of bins. 
The effectiveness of the resulting algorithm hinges of calibrating the number of bins to the expected 
number of balls to be thrown. 

Balls into bins for the first time. The role of procedure Estimate-Size, when called by 
algorithm Common-Bounded-MC, is to estimate the unknown number of processors n, which is 
returned as size, to assign a value to variable number-of-bins, and assign values to each private 
variable bin, which indicates the number of a selected bin in the range [1,number-of-bins]. The 
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Procedure ESTIMATE-Size 

initialize fc •(— 2 /* initial approximation of Ig n */ 

repeat 

k k + 1 

bin„ random integer in [1, A: 2^] 
initialize Nonempty-Bins ■(— 0 
for i 1 to A: 2^ do 
if bin^ = i then 

Nonempty-Bins Nonempty-Bins -|- 1 
until Nonempty-Bins < 2^ 

return (3 • 2^, k 2^) /* 3 • 2^ is size, k 2^ is number-of-bins */ 


Figure 8: A pseudocode for a processor u of a Common PRAM. This procedure is 
invoked by algorithm Common-Bounded-MC in Figure[TOl The variable Nonempty- 
Bins is shared. 


procedure tries consecutive values of k as approximations of Ign. For a given k, an experiment is 
carried out to throw n balls into k2^ bins. The execution stops when the number of occupied bins 
is at most 2^, and then 3 • 2^ is treated as an approximation of n and k2^ is the returned number 
of bins. 

Lemma 18 For n > 20 processors, procedure Estimate-Size returns an estimate size of n such 
that the inequality size < 6ri holds with certainty and the inequality n < size holds with probability 
1 _ 


Proof: The procedure returns 3-2^, for some integer A; > 0. We interpret selecting of values for 
variable bin in an iteration of the main repeat-loop as throwing n balls into k2^ bins; here k = j + 2 
in the jth iteration of this loop, because the smallest value of k is 3. Clearly, n is an upper bound 
on the number of occupied bins. 

If n is a power of 2, say n = 2®, then the procedure terminates by the time i = k, so that 
2^ < 2®'*'^ = 2n. Otherwise, the maximum possible k equals [Ign], because < n < 2 rig’ll. 

This gives 2^^^"''^ = < 2n. We obtain that the inequality 2^ < 2n occurs with certainty, 

and so does 3 • 2^ < 6n as well. 


Now we estimate the lower bound on 2^. Consider k such that 2^ < ^. Then n balls fall into 
at most 2^ bins with probability that is at most 


(k2^\ 


/eA;2^N 

V 2 O 

\k2^^) - 

1 2^ ) 


1 




2M.2'=-n ^ gn/3^-2n/3 


( 12 ) 


The right-hand side of (|12p is at most e when the inequality k > e holds. The smallest k 
considered in the pseudocode in Figure [8] is A: = 3 > e. The inequality A: > e is consistent with 
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Procedure Extend-Names 


initialize Collision-Detected-^ collisiony false 

for i 1 to number-of-bins do 

if bin^; = i for some processor x then 
if bin^ = i then 

for j 1 to /3 Igsize do 

if Verify-Collision then 

Collision-Detected collision^, true 
if not collision^ then 

Last-Name Last-Name + 1 
name„ ^ Last-Name 
bin„ 0 

if (number-of-bins > size) then 
number-of-bins size 
if collision^ then 

bin„ ■(— random integer in [1, number-of-bins] 


Figure 9; A pseudocode for a processor u of a Common PRAM. This procedure 
invokes procedure Verify-Collision, whose pseudocode is in Figure [H and is itself 
invoked by algorithm Common-Bounded-MC in Figure dOl The variables Last- 
Name and Collision-Detected are shared. The private variable name stores the 
acquired name. The constant /3 > 0 is to be determined in analysis. 


2^ < when n > 20. The number of possible values for k is O(logn), so the probability of the 
procedure returning for 2^ < | is ■ O(logn) = □ 

Procedure Extend-Names ’s behavior can also be interpreted as throwing balls into bins, where 
a processor u’s ball is in a bin x when bin„ = x. The procedure first verifies the suitable range 
of bins [1, number-of-bins] for collisions. A verification for collisions takes either just a constant 
time or 0(logn) time. 

A constant verification occurs when there is no ball in the considered bin i, which is verified 
when the line “if bin^, = i for some processor x” in the pseudocode in Figure [9] is to be executed. 
Such a verification is performed by using a shared register initialized to 0, into which all processors v 
with bin^, = i write 1, then all the processors read this register, and if the outcome of reading is 1 
then all write 0 again, which indicates that there is at least one ball in the bin, otherwise there is 
no ball. 

A logarithmic-time verification of collision occurs when there is some ball in the corresponding 
bin. This triggers calling procedure Verify-Collision precisely /31gn times; notice that this 
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procedure has the default parameter 1, as only one bin is verified at a time. Ultimately, when a 
collision is not detected for some processor v whose ball is the bin, then this processor increments 
Last-Name and assigns its new value as a tentative name. Otherwise, when a collision is detected, 
processor v places its ball in a new bin when the last line in Figure 0 is executed. 

To prepare for the next round of throwing balls, the variable number-of-bins may be reset. 
During one iteration of the main repeat-loop of the pseudocode of algorithm Common-Bounded- 
MC in Figure [mi the number of bins is first set to a value that is 0(re log n) by procedure Estimate- 
Size. Immediately after that, it is reset to 0(re) by the first call of procedure Extend-Names, 
in which the instruction number-of-bins size is performed. Here, we need to notice that 
number-of-bins = 0(nlogn) and size = 0(n), by the pseudocodes in Figures (81 and fTOl and 
Lemma [THJ 

Balls into bins for the second time. In the course of analysis of performance of procedure 
Extend-Names, we consider a balls-into-bins process; we call it simply the ball process. It proceeds 
through stages so that in a stage we have a number of balls which we throw into a number of bins. 
The sets of bins used in different stages are disjoint. The number of balls and bins used in a stage are 
as determined in the pseudocode in Figure [H which means that there are n balls and the numbers 
of bins are as determined by an execution of procedure Estimate-Size, that is, the first stage uses 
number-of-bins bins and subsequent stages use size bins, as returned by Estimate-Size. 

The only difference between the ball process and the actions of procedure Extend-Names 
is that collisions are detected with certainty in the ball process rather than being tested for. In 
particular, the parameter /3 is not involved in the ball process (nor in its name). The ball process 
terminates in the first stage in which no multiple bins are produced, so that there are no collisions 
among the balls. 

Lemma 19 The ball process results in all balls ending single in their bins and the number of times 
a ball is thrown, summed over all the stages, being 0{n), both events occurring with probability 

I — 


Proof: The argument leverages the property that, in each stage, the number of bins exceeds the 
number of balls by at least a logarithmic factor. We will denote the number of bins in a stage by m. 
This number will take on two values, first m = k2^ returned as number-of-bins by procedure 
Estimate-Size and then m = 3 ■ 2^ returned as size by the same procedure Estimate-Size, for 
k > 3. Because m = k2^ in the first stage, and also size = 3 • 2^ > n, by Lemma [THl we obtain 
that m > § Ig f in the first stage, and that m is at least n in the following stages, with probability 
exponentially close to 1. 

In the first stage, we throw ii = n balls into at least m = f Ig § bins, with large probability. 
Conditional on the event that there are at least these many bins, the probability that a given ball 
ends the stage single in a bin is 

1 / I nL-1 £i-l n-1 4 

- >1 - - - >1 - >1 -, 

mV mJ - m - f Ig f “ Ign ’ 

for sufficiently large n, where we used the Bernoulli’s inequality. Let Yf be the number of singleton 
bins in the hrst stage. The expectancy of Yi satisfies 
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To estimate the deviation of Yi from its expected value E [Yi] we use the bounded differences 
inequality |49l I52j . Let Bj be the bin of ball bj, for 1 < j < Then Yi is of the form Yi = 
h{Bi,..., Bi-^), where h satisfies the Lipschitz condition with constant 2, because moving one ball 
to a different bin results in changing the value of h by at most 2 with respect to the original value. 
The bounded-differences inequality specialized to this instance is as follows, for any d > 0: 


Pr(yi < E [Ti] - d^/h) < exp(-dV8) • 


(13) 


We employ d = Ign, which makes the right-hand side of (1131) asymptotically equal to 

The number of balls I 2 eligible for the second stage can be estimated as follows, this bound holding 

with probability 1 — 


h < + lgni/4 = 

Ign 



]g^\ 


5n 

Ign ’ 


(14) 


for sufficiently large n. 

In the second stage, we throw ^2 balls into m > n bins, with large probability. Conditional on 
the bound (jl4l) holding, the probability that a given ball ends up single in a bin is 


m 


m ■ — 1- 


1 \^2-i 


mJ 


> 1 - 


£ 2-1 


m 


> 1 - 


Ign 


where we used the Bernoulli’s inequality. Let Y 2 be the number of singleton bins in the second 
stage. The expectancy of Y 2 satisfies 


E|Ul>fe(l-A). 


To estimate the deviation of Y 2 from its expected value E [> 2 ] 1 we again use the bounded differences 
inequality, which specialized to this instance is as follows, for any d > 0 : 


Pr(y2 < E [Ts] - d\/^) < exp(-dV8) • 


(15) 


We again employ d = Ign, which makes the right-hand side of m asymptotically equal to 
n“^(^°®”). The number of balls £3 eligible for the third stage can be bounded from above as 
follows, which holds with probability 1 — : 


£3 < w-^ = 

Ign 





6n 



(16) 


for sufficiently large n. 

Next, we generalize these estimates. In stages i, for i > 2, among the first O(logn) ones, we 
throw balls into m > n bins with large probability. Let £i be the number of balls eligible for such 
a stage i. We show by induction that £i, for i > 3, can be estimated as follows: 


£i< 


6n 
Ig^ n 


•4' 


3-i 


(17) 


with probability 1 — n The estimate (1161) provides the base of induction for i = 3. In the 

inductive step, we assume m, and consider what happens during stage f > 3 in order to estimate 
the number of balls eligible for the next stage i -|- 1. 
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In stage i, we throw ii balls into m > n bins, with large probability. Conditional on the 
bound (fT7|) . the probability that a given ball ends up single in a bin is 


1 


m 


m ■ — 1- 


1 \^i-i 


m 


> 1 - 


ii -1 


> 1 - 


m 


6 • 43 -' 

Ig^ n 


by the inductive assumption, where we also used the Bernoulli’s inequality. If Yi is the number of 
singleton bins in stage i, then its expectation E [Yi] satisfies 


E[Yi\>ii(l- 


6 • 43 -’ 

Ig^ n 


(18) 


To estimate the deviation of Yi from its expected value E [Yi], we again use the bounded differences 
inequality, which specialized to this instance is as follows, for any d > 0: 


PviYi < E [Yi] - < exp(-dV8) . 


(19) 


We employ d = Ign, which makes the right-hand side of (I19p asymptotically equal to 

The number of balls eligible for the next stage i+1 can be estimated from above in the following 

way, the estimate holding with probability 1 — : 


h+i < 


6 • 43 -*•t 


Ig^n 

6 • 43 -*• 


- +\gny/ti 


Ig n 


^(i + |t- 3 lg3n-^"^/") 


< 


6 • 43 -* 6n 




Ig^ n Ig^ n 
6n 


•4 


3-i 


1 + 


4(2 3)/2 J^g4 ^ 
6 \/6n 


< 


< 


Ig n 
6 n 
Ig^ n 
6n 
Ig^ n 


■ 4' 


3 -i ^6-43 * ^ 4(3 d/2 lg2 


■ 4' 


3-2 


Ig^ n 
6 

+ 




Ig^ n\ 


Ig^ n "v/fe / 


• 4' 


3-2-1 


for sufficiently large n that does not depend on i. For the event < E \Yi\—dy/Ti in the estimate ()19p 
to be meaningful, it is sufficient if the following estimate holds: 

Ign-v^ = o(E[yi]) . 


This is the case as long as ii > lg3 n, because E [Yi] = ii{l -|- o(l)) by (fTSl) . 

To summarize at this point, as long as ii is sufficiently large, that is, ii > lg3 n, the number 
of eligible balls decreases by at least a factor of 4 with probability that is at least 1 — 

It follows that the total number of eligible balls, summed over these stages, is 0{n) with this 
probability. 

After at most lg 4 n = ^ Ig n such stages, the number of balls becomes at most lg3 n with 
probability 1 — This number of stages is half of the number of times the for-loop is 

iterated in the pseudocode in Fignre fTOl 
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Algorithm Common-Bounded-MC 
repeat 

initialize Last-Name 0 
(size, number-of-bins) •(— Estimate-Size 
for ^ 1 to Igsize do 

Extend-Names 

if not Collision-Detected then return 


Figure 10: A pseudocode for a processor r of a Common PRAM, where there is a 
constant number of shared memory cells. Procedures Estimate-Size and Extend- 
Names have their pseudocodes in Figures [8] and [H respectively. The variables Last- 
Name and Collision-Detected are shared. 


It remains to consider the stages when < Ig^ re, so that we throw at most Ig^ re balls into at 
least re bins. They all end up in singleton bins with a probability that is at least 

/re — Ig^ reyg^”-^ / Ig^re-yg®*^^^ lg®re 

\ n J ~V re/ ~ re’ 

by the Bernoulli’s inequality. So the probability of a collision is at most One stage without 

any collision terminates the process. If we repeat such stages ^Igre times, without even removing 
single balls, then the probability of collisions occurring in all these stages is at most 

|- lg^re ^|lg» ^ ^-D(logn) ^ 

This number of stages is half of the number of times the for-loop is iterated in the pseudocode in 
Figure [TUI The number of eligible balls summed over these final stages is at most Ig^re = o(re). □ 

The following Theorem IIUl summarizes the performance of algorithm Common-Bounded-MC 
(see the pseudocode in Figure [TO]) as a Monte Carlo one. 

Theorem 10 Algorithm Common-Bounded-MC terminates almost surely. For each a > 0, there 
exists /3 > 0 and c > 0 such that the algorithm assigns unique names, works in time at most cn In re, 
and uses at most ere In re random bits, each among these properties holding with probability at least 
1 — re““. 


Proof: One iteration of the main repeat-loop suffices to assign names with probability 1— 
by Lemma [T9l This means that the probability of not terminating by the ith iteration is at most 
(re-D(logn))i, converges to 0 with i growing to infinity. 

The algorithm returns duplicate names only when a collision occurs that is not detected by 
procedure Verify-Collision. For a given multiple bin, one iteration of this procedure does not 
detect collision with probability at most 1/2, by Lemma [TJ Therefore /IIgsize iterations do not 
detect collision with probability by Lemma [THl The number of nonempty bins ever tested 
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is at most dn, for some constant d > 0, by Lemma [T^ with the suitably large probability. Applying 
the union bound results in the estimate on the probability of error for sufficiently large (5. 

The duration of an iteration of the inner for-loop is either constant, then we call is short, or 
it takes time O(logsize), then we call it long. First, we estimate the total time spent on short 
iterations. This time in the hrst iteration of the inner for-loop is proportional to number-of-bins 
returned by procedure Estimate-Size, which is at most 6n • lg(6n), by Lemma [TSl Each of the 
subsequent iterations takes time proportional to size, which is at most 6n, again by Lemma fTHl We 
obtain that the total number of short iterations is O(relogn) in the worst case. Next, we estimate 
the total time spent on long iterations. One such an iteration has time proportional to Igsize, 
which is at most Ig 6n with certainty. The number of such iterations is at most dn with probability 
1 — for some constant d > 0, by Lemma [191 We obtain that the total number of long 

iterations is O(nlogn), with the correspondingly large probability. Combining the estimates for 
short and long iterations, we obtain O(nlogn) as a bound on time of one iteration of the main 
repeat-loop. One such an iteration suffices with probability 1 — by Lemma [T9l 

Throwing one ball uses O(logn) random bits, by Lemma [THl The number of throws is 0{n) 
with the suitably large probability, by Lemma [T9l □ 

Algorithm Common-Bounded-MC is optimal with respect to the following performance met¬ 
rics: the expected time O(nlogn), by Theorem [H the number of random bits O(nlogn), by 
Proposition [H and the probability of error by Proposition |3J 


11 Monte Carlo for Common with Unbounded Memory 

We consider naming on a Common PRAM in the case when the amount of shared memory is 
unbounded and the number of processors n is unknown. The Monte Carlo algorithm we propose, 
called Common-Unbounded-MC, is similar to algorithm Common-Bounded-MC in Section [TOl 
in that it involves a randomized experiment to estimate the number of processors of the PRAM. 
Such an experiment is then followed by repeatedly throwing balls into bins, testing for collisions, 
and throwing again if a collision is detected, until eventually no collisions are detected. 

Algorithm Common-Unbounded-MC has its pseudocode given in Figure [T2j The algorithm 
is structured as a repeat loop. An iteration starts by invoking procedure Gauge-Size, whose 
pseudocode is in Figure [TTl This procedure returns size as an estimate of the number of proces¬ 
sors n. Next, a processor chooses randomly a bin in the range [l,3size]. Then it keeps verifying 
for collisions fd Ig size, in such a manner that when a collision is detected then a new bin is selected 
form the same range. After such fd Ig size verifications and possible new selections of bins, another 
/3Igsize verifications follow, but without changing the selected bins. When no collision is detected 
in the second segment of jd Ig size verifications, then this terminates the repeat-loop, which triggers 
assigning each station the rank of the selected bin, by a prefix-like computation. If a collision is 
detected in the second segment of /3 Igsize verifications, then this starts another iteration of the 
main repeat-loop. 

Procedure Gauge-Size-MC returns an estimate of the number n of processors in the form 2^, 
for some positive integer k. It operates by trying various values of k, and, for a considered k, by 
throwing n balls into 2^ bins and next counting how many bins contain balls. Such counting is 
performed by a prefix-like computation, whose pseudocode is omitted in Figure [TTJ The additional 
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Procedure Gauge-Size-MC 

k^l 

repeat 

k ^ r{k) 

bin^ •(— random integer in [1,2*^] 
until the number of selected values of variable bin is < 2^//3 
return ( [2^+7/?] ) 


Figure 11: A pseudocode for a processor r of a Common PRAM, where the number 
of shared memory cells is unbounded. The constant /3 > 0 is the same parameter as 
in Figure dll and an increasing function r{k) is also a parameter. 


parameter /3 > 0 is a number that affects the probability of underestimating n. 

The way in which selections of numbers k is performed is controlled by function r{k), which is 
a parameter. We will consider two instantiations of this function: one is function r{k) = k + 1 and 
the other is function r{k) = 2k. 


Lemma 20 If r{k) = k + 1 then the value of size as returned by Gauge-Size-MC satisfies 
size < 2n with certainty and the inequality size > n holds with probability 1 — 

Ifr{k) = 2k then the value of size as returned by Gauge-Size-MC satisfies size < 2/3n^ with 
certainty and size > /3n^/2 with probability 1 — 


Proof: We model procedure’s execution by an experiment of throwing n balls into 2^ bins. If the 
parameter function r{k) is r{k) = k 1 then this results in trying all possible consecutive values 
of k, starting from k = 2, so that k = i + 1 in the ith iteration of the repeat-loop. If the parameter 
function r{k) is r{k) = 2k then k takes on only the powers of 2. 


There are at most n bins occupied in any such an experiment. Therefore, the procedure returns 
by the time the inequality 2^/(3 > n holds, where k determines the range of bins. It follows that if 
r{k) = k + 1 then the returned value /ff] is at most 2n. If r{k) = 2k then the worst error in 
estimating occurs when 27/3 = n — 1 for some i that is a power of 2. Then the returned value is 
2^7/3 = {I3{n — l))^//3, which is at most 2/3n^, this occurring with probability 1 — 


Given 2^ bins, we estimate the probability that the number of occupied bins is at most 2^//3. 


It is 


( 2 “ y2V/3 


2 k 


< 




\2^/l3) 


/3- 


= {eny 


Next, we identify a range of values of k for which this probability is exponentially close to 0 with 
respect to n. 


To this end, let 0 < /? < 1 and let us consider the inequality 

{epf/^ • r" < P" • 


( 20 ) 
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It is equivalent to the following one 


/3 


(1 + In/3) — nIn/3 < nhip 


by taking logarithms of both sides. This in turn is equivalent to 

2k 

P • ' ■ V 


(1 + In /I) < n ^In /I — In . 


(21) 


Let us choose p = P in (|2T]) . Then (1201) specialized to this particular p is equivalent to the 
following inequality 

2^ ,, . In /3 


^-(1 + ln^) < n- 2 


This in turn leads to the estimate 


k ^'k^P 
2 ^ <n-—p- 


P ^ P 

1 + In /3 2 ■ 


which means 2P^^ jfi < n. When k satisfies this inequality then the probability of returning is at 
most There are O(logn) such values of k considered by the procedure, so it returns for one 

of them with probability at most 


0 {logn) ■ < p-^!'^ , 


for sufficiently large n. Therefore, with probability at least 1 — /3 the returned value /(3'] 

is at least as large as determined the first considered k that satisfies 2 P^^ j > n. 

If r{k) = A: + 1 then all the possible exponents k are considered, so the returned value [2^'’'^//3] 
is at least n with probability 1 — If r{k) = 2k then the worst error of estimating n occurs 

when 2®"*‘^//I = n — 1 for some i that is a power of 2. Then the returned value is 


22*+V/3 = 2 • (/3(n - l)/2) V/3 , 


which is is at least /3n^/2, this occurring with probability 1 — /3 □ 

We discuss performance of algorithm Common-Unbounded-MC (see the pseudocode in Fig- 
ure ll2l) by referring to analysis of a related algorithm Common-Unbounded-LV given in Section[71 
We consider a P-process with verifications, which is defined as follows. The process proceeds through 
stages. The first stage starts with placing n balls into 3 size bins. For each of the subsequent stages, 
for all multiple bins and for each ball in such a bin, we perform a Bernoulli trial with the proba¬ 
bility I of success, which represents the outcome of procedure Verify-Collision. A success in a 
trial is referred to as a positive verification otherwise it is a negative one. If at least one positive 
verification occurs for a ball in a multiple bin then all the balls in this bin are relocated in this 
stage to bins selected uniformly at random and independently for each such a ball, otherwise the 
balls stay put in this bin until the next stage. The process terminates when all balls are single in 
their bins. 


Lemma 21 For any number a > 0, there exists P > 0 such that the P-process with verifications ter¬ 
minates within p\gn stages with all of them comprising the total ofO{n) ball throws with probability 
at least 1 — n““. 
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Algorithm Common-Unbounded-MC 
repeat 

size ^ Gauge-Size 

bin^ •(— random integer in [1,3 size] 

for i 1 to /3 Ig size do 

if Verify-Collision (bin„) then 

bin^, ■(— random number in [1,3 size] 
Collision-Detected false 
for i •(— 1 to /3 Ig size do 

if Verify-Collision (bin„) then 
Collision-Detected ^ true 

until not Collision-Detected 

name^ ^ the rank of bin^, among selected bins 


Figure 12: A pseudocode for a processor r of a Common PRAM, where the number 
of shared memory cells is unbounded. The constant /3 > 0 is a parameter impacting 
the probability of error. The private variable name stores the acquired name. 


Proof: We use the analysis of a ball process relevant to Common PRAM with unbounded memory 
given in Section [7l The constant 3 determining our /3-process with verifications corresponds to 
1 + j3 m. Section [71 The corresponding /3-process in verihcations considered in Section [7] is defined 
by referring to known n. We use the approximation size instead, which is at least as large as n with 
probability 1 — /3“”/^, by Lemma [20] just proved. By Lemma [TOl our /3-process with verifications 
does not terminate within /31gn stages when size > n with probability at most and the 

inequality size > n does not hold with probability at most /3“"'/^. Therefore the conclusion we 
want to prove does not hold with probability at most -|- /3“”'/^, which is at most for 

sufficiently large n. □ 

The following Theorem summarizes the performance of algorithm Common-Unbounded-MC 
(see the pseudocode in Figure [T2]) as a Monte Carlo one. Its proof relies on mapping an execution of 
the /3-process with verihcations on executions of algorithm Common-Unbounded-MC in a natural 
manner. 

Theorem 11 Algorithm Common-Unbounded-MC terminates almost surely, for a sufficiently 
large /3. For each a > 0, there exists /3 > 0 and c > 0 such that the algorithm assigns unique 
names and has the following additional properties with probability 1 — If r{k) = k + 1 then at 
most cn memory cells are ever needed, cn In^ n random bits are ever generated, and the algorithm 
terminates in time O(log^n). If r(k) = 2k then at most cn^ memory cells are ever needed, cnlnn 
random bits are ever generated, and the algorithm terminates in time 0{logn). 

Proof: For a given a > 0, let us take /3 that exists by Lemma [21] When the /3-process with 
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verifications terminates then this models assigning unique names by the algorithm. It follows that 
one iteration of the repeat-loop results in the algorithm terminating with proper names assigned 
with probability 1 — n““. One iteration of the main repeat-loop does not result in termination with 
probability at most n““, so i iterations are not sufficient to terminate with probability at most 
n“®“. This converges to 0 with increasing i so the algorithm terminates almost surely. 

The performance metrics rely mostly on Lemma [20l We consider two cases, depending on which 
function r{k) is used. 

If r{k) = k + 1 then procedure Gauge-Size-MC considers all the consecutive values of k up 
to Ign, and for each such k, throwing a ball requires k random bits. We obtain that procedure 
Gauge-Size-MC uses 0{n log^ n) random bits. Similarly, to compute the number of selected values 
in an iteration of the main repeat-loop of this procedure takes time 0{k), for the corresponding k, 
so this procedure takes O(log^n) time. The value of size satisfies size < 2n with certainty. 
Therefore, 0{n) memory registers are ever needed, while one throw of a ball uses O(logn) random 
bits, after size has been computed. It follows that one iteration of the main repeat-loop of the 
algorithm, after procedure Gauge-Size-MG has been completed, uses O(nlogn) random bits, by 
Lemmas 1201 and 1211 and takes O(logn) time. Since one iteration of the main repeat-loop suffices 
with probability 1 — n““, the overall time is dominated by the time performance of procedure 
Gauge-Size-MG. 

If r{k) = 2k then procedure Gauge-Size-MG considers all the consecutive powers of 2 as values 
of k up to Ign, and for each such k, throwing a ball requires k random bits. Since the values k 
form a geometric progression, procedure Gauge-Size-MG uses O(logn) random bits per processor. 
Similarly, to compute the number of selected values in an iteration of the main repeat-loop of this 
procedure takes time 0{k), for the corresponding k that increase geometrically, so this procedure 
takes O(logn) time. The value of size satisfies size < 2n with certainty. By Lemma [20l 
memory registers are ever needed, so one throw of a ball uses O(logn) random bits. One iteration 
of the main repeat-loop, after procedure Gauge-Size-MG has been completed, uses C>(nlogn) 
random bits, by Lemmas 1201 and 1211 and takes O(logn) time. □ 

The instantiations of algorithm Gommon-Unbounded-MG are close to optimality with respect 
to some of the performance metrics we consider, depending on whether r(k) = A: -|- 1 or r{k) = 2k. 
If r[k) = k + 1 then the algorithm’s use of shared memory would be optimal if its time were 
O(logn), by Theorem [2l but it misses space optimality by at most a logarithmic factor, since the 
algorithm’s time is O(log^n). Similarly, for this case of r{k) = k + 1, the number of random bits 
ever generated O(nlog^n) misses optimality by at most a logarithmic factor, by Proposition [TJ In 
the other case of r{k) = 2k, the expected time C>(logre) is optimal, by Theorem [3l the expected 
number of random bits O(nlogre) is optimal, by Proposition [H and the probability of error 
is optimal, by Proposition [3l but the amount of used shared memory misses optimality by at most 
a polynomial factor, by Theorem [31 


12 Conclusion 

We considered the naming problem for the anonymous synchronous PRAM when the number of 
processors n is known. We gave Las Vegas algorithms for four variants of the problem, which are 
determined by the suitable restrictions on concurrent writing and the amount of shared memory. 
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Each of these algorithms is provably optimal for its case with respect to the natural performance 
metrics such as expected time (as determined by the amount of shared memory) and expected 
number of used random bits. 

We also considered four variants of the naming problem for an anonymous PRAM, when the 
number of processors n is unknown, and developed Monte Carlo naming algorithms for each of them. 
The two algorithms for a bounded number of shared registers are provably optimal with respect to 
the following three performance metrics: expected time, expected number of generated random bits 
and probability of error. It is an open problem to develop Monte Carlo algorithms for Arbitrary and 
Common PRAMs for the case when the amount of shared memory is unbounded, such that they 
are simultaneously asymptotically optimal with respect to these same three performance metrics: 
the expected time, the expected number of generated random bits and the probability of error. 

The algorithms we gave cover the “boundary” cases of the model. One case is about a minimum 
amount of shared memory, that is, when only a constant number of shared memory cells are avail¬ 
able. The other case is about a minimum expected running time, that is, when the expected running 
time is O(logn); such performance requires a number of shared registers that grows unbounded 
with n. It would be interesting to have the results of this paper generalized by investigating naming 
on a PRAM when the number of processors and the number of shared registers are independent 
parameters of the model. 
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